Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Why Do Biologists Need So Many Mapping Programs?

<font colore=red>Biological world is more complex than matching of plain text</font>

Evolution and genetic code add two layers of complexity to the biological problems. Therefore, a number of questions asked by geneticists do not have any equivalent, when searching plain text. Few examples follow.

Nucleotide vs Amino Acids

Nucleotides have four possibilities (A,T,G,C), but a search program working with nucleotides also needs to take two strands into account. Therefore, the same program cannot be extended for proteins in a straightforward manner. The proteins have 20 possible amino acids and no double stranded behavior. Therefore, at the least, the geneticists need two separate sets of programs for nucleotides and proteins.

Phylogeny

Geneticists often ask which novel proteins in an organism matches ten orthologous proteins from ten organisms and you would like to find which of the novel proteins in your organism matches them? It is an evolution-based question that has no equivalent in plain-text search.

Genome to Genome Match

With the availability of large genome sequences, it is now possible to align one entire genome with another. Such questions need to factor in the size.

Search Gene in a Genome Where it is From

Let us say you like to map a human gene in the human genome. The main factor in this case is speed.

Search Gene in Related Genome

Let us say you like to map a human gene in the mouse genome. This time, the search program needs to take variations into account.

Search Protein in a Genome

Let us say you like to map a human protein in the mouse genome. This time, the search program needs to take variation and genetic code into account.

<font color=red>Introduction of new sequencing technologies</font>

If the above questions were not enough, introduction of new sequencing technologies added another layer of complexity both in terms of size of files and the technology-specific outputs.

Align Billions of Short Reads on a Genome

Many NGS instruments generated a large number of short reads, and the mapping challenge came to

Align Millions of PacBio Reads on a Genome

Align Thousands of Genomes Against Each Other


Web Statistics