Notes on PacBio-based Assembly of Plasmodium 3D7
Black Artist Jason Chin sent us his notes on assembly of Plasmodium genome that readers will find helpful. You can download the entire document with scripts from here.
Here are four text documents from him:
1. A Computation Note for Assembling Plasmodium 3D7 with CLEAR, Part I
Plasmodium is a parasite that causes malaria. Understanding its genetics will help to find a cure to the disease. From the sequencing technology point of view, it posts a great challenge to sequence and assembly the genome. Due to its very in-balanced AT/GC content (AT ~= 80% and GC ~=20%), most sequence technology can not produce good and long sequences that enables assembling the genome into long contigs. For example, the earlier publication using 2nd geneneration sequence technology can only get contig N50 about 1 to 4 kbp (BMC Genomics. 2011; 12: 116, http://www.biomedcentral.com/1471-2164/12/116). (See other related assembly statistics from http://www.broadinstitute.org/annotatio n/genome/plasmodium_falciparum_spp/AssemblyStats.html) Using Sanger sequencing technology will get a better results of which the contig N50 is about 10 to 20kb. Here we demostrate that using PacBio(R) RS Single Molecule Real-Time (SMRT(R)) sequencing technology, we can easily assemble the genome much better results (N50 ~= 954kb about 43x of the) than the earlier 2nd gen. sequencing results even with some simple home-made assembly code.
If you want pdf version of the above article, please check here.
2. A Computation Note for Assembling Plasmodium 3D7 with CLEAR, Part II
Same in pdf version.