Tutorials

Enjoy This Site? Join Our Remote R/Bioinformatics Classes

Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Summary

Let us summarize the concepts discussed in this section.

Needleman-Wunsch (1970) and Smith-Waterman (1981) were the two earliest proposed algorithms for pattern search in genomic data. They both use dynamic programming and give exact results, but they are very slow.

All modifications to those core programs use the concept of seeding and extending, and the real challenge is to hold the seed in an index structure. Large index structure holds more data and allows rapid search, but also requires more memory. Hash table, suffix array and Burrow Wheeler Transform combined with FM index are three mathematical developmemts to address the memory issues.