Tutorials

Enjoy This Site? Join Our Remote R/Bioinformatics Classes

Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Stampy

“To achieve good sensitivity, Stampy also uses a hash table, representing the location of selected 15-mers in the reference genome. The hash table uses a novel data structure, which results in improved search times compared with those of standard implementations and in the efficient use of the available memory. The algorithm first identifies candidate mapping locations for each read using the hash. Specifically, the hash is searched for every overlapping 15-mer in the read, as well as their neighbors at one mismatch removed. For a 36-bp read, for example, this results in 1012 (22 × 46) search operations. The candidate mapping locations are filtered for sufficient sequence similarity to the read, and then an attempt is made to align the read to the reference at each qualifying location. A fast gapped aligner is used, which respects quality scores and considers short indels of up to 15 bp. Next, for a mate pair, the results of its alignment are considered. Finally, candidate reads or read pairs are realigned using a full probabilistic aligner that considers indels up to, by default, 30 bp. Full details of the algorithm are provided in the online Supplemental material.”

http://www.ncbi.nlm.nih.gov/pubmed/20980556