Why these 20 amino acids?


Author/Creator ORCID




Biological Sciences


Biological Sciences

Citation of Original Publication


This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.


Nearly all living organisms use the same set of 20 amino acids to make proteins. However, substantial evidence indicates that this standard alphabet of amino acids is a mere subset of what was available to life during early evolution. A common qualitative explanation claims that the diversity of encoded amino acids has increased during evolution, enabling diverse protein structures and functions to be made from them. Therefore, I proposed a testable baseline hypothesis that the current amino acid alphabet comprises a subset that maximizes the diversity of certain key properties, which is responsible for protein diversity. To build a quantitative framework, I first investigated the reliability of using computational programs in predicting three fundamental amino acid properties. These properties were size, charge and hydrophobicity, as measured by van der Waals volume, isoelectric point, and logP, respectively. My results demonstrated the plausibility of estimating amino acid physiochemical properties with fast and reliable computational approaches, which can thus supplement or even replace time-consuming and costly experimental determinations. I then tested quantitative formulations of qualitative explanations for the origin of the amino acid alphabet, such as the idea that biochemical diversity of the amino acid alphabet was somehow optimized by natural selection. My research first focused on investigating whether the group of eight standard amino acids (i.e., those used in genetic coding) that are routinely produced in abiotic syntheses is in some way a non-random sample of the 66 abiotic amino acids that have been found in the Murchison meteorite. I compared statistical variance (a diversity measure) of the eight ""standard"" amino acids to alternative samples of eight amino acids chosen at random from the larger set of those that are prebiotically plausible. My results showed that when factoring in amino acid abundance, the eight standard amino acids are more diverse than most of the random sets in terms of logP (hydrophobicity) and pI (charge). These results are consistent with, though not strongly supportive of, the idea that amino acids may have been non-randomly ""chosen"" during the earliest stage of amino acid alphabet evolution. Next, I investigated the change of amino acid diversity along the formation of the genetic code. Specifically, I evaluated two widely discussed models of genetic code development: ""sequential incorporation"" (by which the genetic code encoded proteinaceous amino acids one at a time, gradually leading to the standard alphabet of 20) and ""ambiguity reduction"" (by which all 20 proteinaceous amino acids were present from the start, but moved from a state of non-specific tRNA charging to later states of increasingly specific charging). Still using statistical variance as the diversity measure, I asked how each model relates to the idea that natural selection exerted a pressure to increase the biochemical diversity of amino acids during code evolution. My results show that the ""ambiguity reduction"" model is more straightforwardly consistent with the widespread idea that natural selection acted to produce a diverse alphabet of amino acids for genetic coding. To make the amino acid data that derived from my research freely available for the scientific community and the public, I have also developed a novel web-resource of information pertaining to 387 amino acids. The database includes general information about each amino acid, the sources from which this amino acid is known, the three fundamental biophysical properties, and several analysis and visualization tools. Additionally, to illustrate the types of exploration that our database can support, I also conducted two simple Quantitative Structure-Activity Relationship (QSAR) studies of peptides that include non-standard amino acids. Each demonstrated the utility of the three fundamental biophysical properties on which our database focuses and the speed and ease with which meaningful bioactivity results (specifically potentiating activity, bitterness of taste) can be explored. In summary, my dissertation work not only deepened our understanding of the formation of the amino acid alphabet but also provided quantitative framework for origin-of-life studies and protein engineering research.