Accurate base-calling is the step in nanopore sequencing with possibly the most room for improvement. The base-calling from raw electrical signals requires multiple steps - calling the ‘events’ implying something passing through the pore or not, segmenting those events and finally assigning nucleotide bases to them.
Although the electrical signals from Oxford Nanopore sequencers can be accessed from various early access participants, the fully automated segmentation program of the company is kept as proprietary. Readers interested in analyzing the raw electrical signals to see whether they can improve the quality of analysis will find a new paper by reader gasstationwithoutpumps useful.
Nanopore-based single-molecule sequencing techniques exploit ionic current steps produced as biomolecules pass through a pore to reconstruct properties of the sequence. A key task in analyzing complex nanopore data is discovering the boundaries between these steps, which has traditionally been done in research labs by hand. We present an automated method of analyzing nanopore data, by detecting regions of ionic current corresponding to the translocation of a biomolecule, and then segmenting the region. The segmenter uses a divide- and-conquer method to recursively discover boundary points, with an implementation that works several times faster than real time and that can handle low-pass filtered signals.
Here is briefly what they do -
Boundary points are identified by our segmenter one at a time using a recursive algorithm. We start by considering the entire event as a single segment, then consider each possible boundary to break it into two segments. To avoid edge effects and ensure that all segments have at least a minimum duration, only potential boundaries more than the minimum segment length from the ends of the segment are considered.
Each possible boundary is scored using a log-likelihood function (Eq. 1). If the maximal score is above a threshold, the segment is split and the two subsegments are recursively segmented. The recursion terminates either when the segment to split is less than twice the minimum segment length or no score within the segment exceeds the threshold.
The codes are available freely from this github page, which also includes a lot of information on the algorithm -
This process for detecting and segmenting events in nanopore signals should run in real time; either segmenting a stream of data as it comes in or quickly segmenting an event shortly after its completion. To test the speed of the algorithm, the event detector was implemented in Python and the segmenter was implemented in Cython, a language that allows the efficiency of C within Python. The current implementation is designed to segment full events and is available at the first authors public github page.