On the nature of preferred codons of protein-coding genes in living organisms
Institute of Cell Biophysics of RAS, Pushchino, Moscow region, Institutskaya str. 3
To date, it is generally believed that in the conditions of degeneration of the genetic code, the codon usage bias in the genetic processes of Pro - and Eukaryotes is a balance of two main forces – the influence of natural selection and mutational "pressure". However, in our opinion, due attention is not paid to the search for a possible physical root cause underlying this phenomenon.
Earlier in our works, we have already pointed out the important role of the "hidden" ambiguity of the form of complementary H-pairing of nitrrous bases in initiating the observed features of the structural and functional organization of nucleic acid molecules. In this study, the initially high level of GC Watson-Crick pairs, 4-fold base-binding polymorphism, compared with 2-fold at-pair polymorphism, is considered as the most likely structural factor that significantly regulates the nucleotide composition of preferred codons in protein-coding genes of living organisms. To solve this problem, we used comparative genomics methods to perform a frequency analysis of the occurrence of all 64 codons in the genes of a wide representation of Pro- and Eukaryotes, covering a large range of their genome sizes (from 1.6 Mb to 140,000 Mb). The genomes of amoeba (Amoeba proteus), tardigrade (tardigrada), horseshoe crab (Limulus Polyphemus), and mollusk (Nautilus pompilius) were taken as examples of relict eukaryotes. Examples of other eukaryotes, and the prokaryotes, were the genomes of humans (Homo sapiens), chimpanzees (Pan triglodites), mice (Mus musculis), marbled lungfish (Protopterus aethiopicus), frogs (Xenopus tropicalis), flies (Drosophila melanogaster), flowers (Arabidopsis thaliana), and cellular slime mold (Dictyostelium discoideum), a parasite (Leishmania major), a yeast (Saccharomyces cerevisiae), a malaria parasite (Plasmodium farciparum), a bacterium (Escherichia coli), And a very small bacterium (Candidatus pelagibacter). The GenBank database and www.kazusa.or.jp/codon resources were used.
The results generally confirmed this assumption. It is shown that in the protein-coding genes of all organisms, codons are preferred, where either the A or T(U) base is located in the second codon position. In other words, those bases that are characterized by initially low structural polymorphism of complementary H-pairing and consistently provide a clearer fixation of the structure of the Central link of the "lock-key" system (codon-anti-codon) of functional complexes were designated. At the same time, a detailed analysis of codon types revealed the predominance of helical-initiating and helical-terminal amino acids of globular proteins in all eukaryotes, which correlates with a high concentration of histones in their chromatin. For mitochondrial genes, a sharp depletion of the number of such amino acids in the preferred codons was obtained.