We are continuing our discussion of eukaryotic genome evolution based on Dan Graur’s “Molecular and Genome Evolution”. In this post, we look at two key measures - genome size and gene size.
1. Is there any pattern in the genome sizes of the eukaryotes?
The ratio of sizes of the largest and the smallest known eukaryotic genomes is 400,000 (Table 11.2 of “Molecular and Genome Evolution”). Japanese plant Paris japonica holds the record for the largest genome (150Gb), whereas the nucleomorph of Bigellowiella natans has the smallest genome (373Kb). Outside the nucleomorphs, the genomes of microsporidiae are the smallest (E. cuniculi - 2.5Mb, E. intestinalis - 2.3Mb).
It is noteworthy that even for closely related organisms the genome sizes can vary by orders of magnitude. For example, the ratio of the largest to smallest genome size is 15,204 for green plants and 6,642 for animals. Within animals, the ratio is 380-fold for bony fish, 188-fold for insects and 342-fold for flatworms.
Analysis of thousands of available genome sizes show no evidence that the ‘simple’ organisms have smaller genome than the ‘complex’ organisms. This is known as c-value paradox. Complexity of an organism is defined by the number of distinct cell types.
2. Is there any pattern in the gene counts of the eukaryotes?
The ratio of the highest to the lowest protein-coding gene count in various eukaryotic organisms is about 300, which is substantially lower than the difference in genome sizes. Moreover, for most animals, the gene counts ranges between 15,000 to 35,000 despite genome size changing by many orders of magnitudes.
There is no evidence that more complex organisms have higher counts of protein-coding genes. This is known as G-value paradox. Potato (Solanum tuberosum) and Apple (M. domestica) genomes has two and three times as many protein-coding genes as the human genome. Among two well-studied organisms at the cell level, worm C. elegans has only ~1000 cells, whereas fruitfly D. melanogaster has nearly 10^8 cells. However, C. elegans genome contains 6,500 more protein-coding genes than D. melanogaster.
Mechanisms for Genome Evolution
Above observations regarding the evolution of genomes and genes can be explained by two processes.
The changes in physical characteristics of the organism are supported by an increase or decrease in the number of genes, but that connection between genes and phenotypes is better explained by a Rube Goldberg machine than a precisely engineered machine. New genes for this Rube Goldberg machine are mostly created by gene duplication, whole genome duplication and loss of function. Therefore, the gene count within the genomes do not change by a large degree.
In parallel, the entire genome gets bombarded by transposons. These parasitic elements move around the genome by creating many copies, and that is the primary mechanism for the increase in genome sizes. As long as that copy-and-paste process is not deleterious, they do not get removed by selection.
Immune genes in jawed vertebrates (including humans) are unlike other protein-coding genes, because a single gene sequence on the genome can code for millions of different proteins. This protein “diversity” helps in the recognition of various attackers, thus making adaptive immunity possible.
It has been shown that Rag1 and Rag2 genes, which are the key mediators of the process, came from transposable elements. This is an interesting demonstration of two processes of genome evolution interacting to provide an evolutionary benefit to the jawed vertebrates.
In the following posts of this series, we will discuss what this reference to “Rube Goldberg” machine and neutral evolution mean. Also, we will discuss other genomic characteristics - chromosome count, distribution of genes within chromosomes, GC content, etc.