Kraken will be decommissioned on April 30, 2014. For more information see Kraken Decommission FAQs
Kraken will be decommissioned on April 30, 2014. For more information see Kraken Decommission FAQs
The National Institute for Computational Sciences

What’s Behind the Bias?

Ribosomes Choose Certain Chemical Particles over Others in Manufacturing Proteins

by Scott Gibson



Besides being the subject of everyday conversations about diet and nutrition, protein is of great interest in scientific research. This is understandable because protein is responsible for a host of important functions in the body, from building tissue, to producing vital substances such as hormones, to transporting oxygen throughout the body, moving messages from cell to cell, and regulating processes. So, the more we know about protein, the better.

The enormous amounts of data collected through collaborative studies of genomes—the complete sequence of the genetic material of organisms—are resources for investigating protein dynamics and acquiring other information important to biology, says University of Tennessee, Knoxville, Professor Michael Gilchrist.

Gilchrist is leading a research team that’s using computational methods to extract biologically meaningful information from the genome of Saccharomyces cevevisiae, baker’s or brewer’s yeast. Specifically, the researchers are seeking to gain a quantitative understanding of a phenomenon called codon usage bias (CUB), which relates to protein translation, one of the most fundamental and universal biological processes.

Protein Manufacturing and Gene Expression

Ribosomes, tiny molecular machines, make proteins for the cell by linking amino acids in an order specified by a gene’s messenger ribonucleic acids (mRNA). Information in the mRNA is encoded in triplets of different nucleic acids for an amino acid. The nucleic acid triplets are commonly referred to as codons.

Although the cell uses only 20 amino acids, the mRNA can contain 64 different types of codons. Therefore, most of the amino acids are represented by more than one codon. The multiple codons that encode for the same amino acid can be referred to as synonymous.

Gilchrist and his team are looking closely at gene expression, the process by which the information in a gene is used to assemble a functional gene product, often a protein.

Codon Usage Bias

While synonymous codons all code for the same amino acid for the mRNA of most genes, some synonymous codons are used more often than others. The researchers are plotting gene expression on a graph in an effort to see which codons are favored by mutation bias, or random changes in the sequence of nucleotides; or by natural selection, a key, non-random mechanism of evolution in which one synonymous codon is favored over another.

Graphs: How codon usage changes with gene expression

“The bias in codon usage switches between low-expression genes and high-expression genes,” Gilchrist says in reference to a trend the team is discovering in the graph. “Genes that are highly expressed tend to use certain codons preferentially over others.”

Gilchrist says that in addition to revealing which codons are preferred by mutation bias or natural selection, the research also provides insight into how natural selection acts on the DNA that encodes the genes.

Research directed at understanding CUB raises questions as to why the cell prefers one codon rather than another. Is this the result of ribosomes having different wait times for different codons? Or is it that some codons are less prone to cause a mistake during protein assembly than others? The explanation for CUB could involve not just one answer, but a combination of answers, Gilchrist explains.

Greater Efficiency and Faster Results

Gilchrist’s team developed software in R programming language that models the efficiency of a ribosome translating mRNA in relation to the evolution of CUB. R has become a standard in bioinformatics, the discipline of examining methods of storing, retrieving and analyzing biological data.

Gilchrist consulted with the data analysis team at the Remote Data Analysis and Visualization Center (RDAV), and the team was able to improve the efficiency of the R code and optimize it to run on the Nautilus supercomputer, a system managed by the National Institute for Computational Sciences (NICS) and housed at Oak Ridge National Laboratory.

Preliminary results showed that calculations that took 24 hours on the desktop computer could be executed in less than an hour on Nautilus. The RDAV data analysis team is continuing to work on improving the efficiency of the code.

“Being able to get the results in hours instead of weeks greatly increases our ability to develop our methods and test alternative hypotheses about the forces driving the evolution of CUB,” Gilchrist says. “It also greatly facilitates our ability to develop our intuitive understanding because we get the results while the ideas are fresh in our minds and at their clearest point.”

Gilchrist says that while NICS provides him and his team with access to powerful computers, they offer another essential element—researchers who can assist in the use of the systems. He explains that the NICS team supplied the expertise needed to make R and the faster C language communicate, and the guidance for his team to understand the trade-offs between ease of programming and quality of information output.

A Higher Volume of Meaningful Information

“This research is helping to advance our quantitative understanding of the costs and errors associated with protein translation,” Gilchrist says. “It also demonstrates how we can extract more and more meaningful information from genomic datasets using an integrated set of biologically based models rather than generic algorithms or simple linear models.”

Gilchrist says that besides developing models to explore CUB and further the body of knowledge in evolutionary biology, his team wants to develop computational tools that other researchers can use in easily fitting their models to data.

About NICS: The National Institute for Computational Sciences (NICS) operates the University of Tennessee supercomputing center, funded in part by the National Science Foundation. NICS is a major partner in NSF’s Extreme Science and Engineering Discovery Environment, known as XSEDE. The Remote Data Analysis and Visualization Center (RDAV) is a part of NICS.