E. coli genome has only 4,500 genes, which is the approximate count for most other prokaryotic genomes. Eukaryotic yeast genome has ~6000 genes, and human and most vertebrates have 20,000-25,000 coding genes. In contrast, the genome of unicellular eukaryotic protist Trichomonas vaginalis has almost 60,000 genes !!!
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ?160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.
T. vaginalis wins our nomination for having such a large genome count. It is the kind of genome that gets Dan Graur curious about understanding the biological implications having so many genes. Graur is currently writing a book on genome architecture.
The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the worlds population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied.
Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T.?vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed.
Our results demonstrate that T.?vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasites genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve.