Heng Li's Mysterious New Program

Heng Li's Mysterious New Program

Heng Li is developing a new code, but the github page does not have any comment on what it is. I am guessing it is a Bloom filter-based short read error-correction program, where you give an input fastq file and the program will output the error-corrected file with erroneous nucleotide marked in small letters.

Usage: bfc [options]


-s FLOAT approx genome size (k/m/g allowed; change -k and -b) [unset]

-k INT k-mer length [33]

-t INT number of threads [1]

-b INT set Bloom filter size to pow(2,INT) bits [33]

-H INT use INT hash functions for Bloom filter [4]

-d FILE dump hash table to FILE [null]

-E skip error correction

-r FILE restore hash table from FILE [null]

-w INT no more than 5 ec or 2 highQ ec in INT-bp window [10]

-c INT min k-mer coverage [3]

-D discard uncorrectable reads

-v show version number

-h show command line help

Personally I look forward to it for another reason. Both Heng Li and the authors of kmc2 write super-efficient codes, and it seems like the fastq readers of kmc2 team is winning over Heng Li’s klib. Unfortunately, I have not figured out the secret sauce of the Polish group and this code will help me make a direct comparison.

Written by M. //