Non-random Distributions of the Canonical Amino Acids in Chemistry Space

Author/Creator ORCID

Department

Biological Sciences

Program

Biological Sciences

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

By the time of the last universal common ancestor (LUCA), terrestrial life most likely possessed a shared ÒalphabetÓ of 20 L-?-amino acids with which it built genetically encoded proteins. Given the multitude of ?-amino acids available to the earliest life, from both abiotic and biotic sources, an active research question for decades has been why life has used this specific amino acid ÒalphabetÓ for billions of years. This thesis explores this question using an in-silico approach that utilizes the concept of chemistry space to examine the alphabet as a whole. To achieve the structural and functional diversity seen in extant terrestrial proteins, it pursues the hypothesis that the amino acid alphabet was selected such that its members would be evenly spread over a large region of chemistry space. Alphabets of ?-amino acids were drawn at random from libraries containing abiotic, biotic, and computationally generated alternatives; each alphabet was evaluated for its distribution in specific dimensions of chemistry space, and thus compared against the canonical amino acid alphabet. These experiments supported some findings of earlier, similar experiments: compared to alternate alphabets, members of the canonical alphabet are very evenly distributed over a very broad range of molecular volumes. However, this exact pattern is not repeated for the other dimensions tested (hydrophobicity and charge). The general patterns observed for the full canonical alphabet of 20 amino acids also appear in prebiotically plausible subsets of the canonical alphabet, with the pattern strength varying based on the size and composition of the pool used to create random, alternative alphabets. Our results demonstrate a more nuanced chemical ÒlogicÓ behind the selection of the canonical amino acid alphabet than had been proposed prior to the work of this thesis. Such nuance can benefit researchers in both the Origin of Life and synthetic biology fields; the former can better understand the selective pressures behind the evolution of an amino acid alphabet, while the latter can use the chemical logic seen in the canonical alphabet to engineer organisms with ÒxenoalphabetsÓ containing multiple non-canonical amino acids.