Prashant Pandey, Rob Patro and collaborators published a number of excellent papers on a new kind of “compound” hashing scheme. The original paper discussing the idea is available at “A General-Purpose Counting Filter: Making Every Bit Count”, but they published other papers linking their idea to bioinformatics. We wrote about Mantis last year in this blog.
Apparently, this is the fastest kmer-counting method at present, and that got me curious. Unfortunately, I could not compile their github repo in either my cygwin gcc/g++ or my linux server. So, I created an alternate repo by cleaning up a bunch of C++ files. This simpler repo include the core library and compiles in my machine.
SeqOthello is the other competing algorithm published last year. Be aware that this library comes with a GPLv3 license. Therefore, I am going to stay away from it until there is a compelling reason to use the library.