Contributed by our judge, who voted for Sailfish in Best of 2013.
Sailfish is a good example of neat, modern NGS software development. I looked at the code and found some gems.
1. The Makefile (Cmake) is excellent:
- [It automatically downloads and installs dependencies, i.e.
CMPH (for perfect hashing) and Jellyfish.
- It can even automatically download and install boost without root.
This, I think, removes the only obstacle that generally prevent us from using boost in our softs.
2. The ingredient that seems to make Sailfish fast is minimal perfect hashing, instead of classical hash table lookups. This works for static hash tables only. They use the existing CMPH library for perfect hashing, [so the code in Sailfish is just a
3. They use TBB for various multi-core operations and data structures.
E.g. parallel-for, thread-safe hash table, set, queue. For instance parallel- for has the potential to be faster than naively parallelizing using threads with e.g. Boost.
4. They use Boost, but not so much. The replaced the lock-free queue of Boost with the one from TBB. The only Boost functions widely used are filesystems operations and range operations. Just in this file they use some boost data structures: dynamic_bitset and accumulators (for stats).
1. The default Fasta file reader is Jellyfish’s, and it falls back to kseq whenever the input is exotic (named pipe). So perhaps Jellyfish’s fasta parser is even faster than kseq? Reading fasta files quickly is quite important for performance.
2. The software tells the user whenever a new version
is available (but doesn’t auto-update itself).
3. The command line handling is clean.
4. It uses C++11 so gcc-4.7 is required. It’s probably OK to use C++11 in today’s software development. The main developer of Sailfish has a good understanding of c++11, he made a small tutorial of its new features on its github: https://github.com/rob-p/cpp11fun/tree/master/src.