Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

How Does BLASR Work?

Based on their analysis, Chaisson and Tesler came up with 15nt as the optimum size of the seed. They noticed another pattern in their data. In regions representing correct alignment of a PacBio read with the reference sequence, multiple 15nt seeds could be found in near proximity.

As a first step, BLASR compares a PacBio read with a reference sequence and searches for matching 15-mers. Given that we are looking for exact matches, the search can be done efficiently by using Burrows-Wheeler transform.

Due to the presence of repeats in a genome, the above step is likely to give too many seeds from various incorrect regions. To reduce the list, BLASR picks the regions with multiple matching 15-mers seeds. More specifically, BLASR finds regions with at least 10 anchors of size > 15nt, and then further aligns reads with those genomic regions using a more precise alignment method (e.g. Smith-Waterman).

Can BLASR be used to align genome of one organism with another? Please keep in mind that BLASR strategy makes use of the fact that error rate in PacBio reads is uniform. When we compare two genomes, the distribution of errors changes from one segment to another. Intergenic regions have higher degree of variation than the genomic or cis-regulatory regions.


Web Statistics