Those working on genome assembly may find the new tools from Pierre Marijon et al helpful. The related preprint “yacrd and fpa: upstream tools for long-read genome assembly” is now available from biorxiv.
For context, Gene Myers noted the importance of removing chimeric reads from the pool in a 2015 blog post. In 2017, he introduced DASCRUBBER as a “scrubber” module for his DAZZLER assembler. In 2018, LaPierre et al. published MiniScrub to address similar problems.
Among the newly published yacrd and fpa, yacrd (Yet Another Chimeric Read Detector) “scrubs” the reads, whereas fpa (Filter Pairwise Alignment) filters overlaps found between reads. In the paper, they used miniasm and recently “pre”published wtdbg2 for assembling the reads with and without scrubbing. Here are a few comments based on the paper.
Tools in the new paper are two orders of magnitude faster than DASCRUBBER. MiniScrub was not used in the comparison, because it required 256GB.
2. ONT Reads
Even though both scrubbers reduced misassemblies in general than performing assembly on raw reads, “on ONT reads, DASCRUBBER reduces the number of misassemblies by a factor of 2-3 more than yacrd.” So, the tradeoff is between whether you want the assembly “now” or can wait for three days to get a better assembly. This technology-dependent difference in performance is puzzling.
Motivation Genome assembly is increasingly performed on long, uncorrected reads. Assembly quality may be degraded due to unfiltered chimeric reads; also, the storage of all read overlaps can take up to terabytes of disk space.
Results We introduce two tools, yacrd and fpa, to respectively perform chimera removal/read scrubbing, and filter out spurious overlaps. We show that yacrd results in higher-quality assemblies and is two orders of magnitude faster than the best available alternative.