Twelve Developments on 12/12/12
Yes, today is 12/12/12 !! We will have to include 12 developments, no less, no more. If we fall short, readers may get a chance to know what Balti food (#baltibio) is all about :)
(i) Looking for a Large Genome to Work on? Here is an interesting factoid from Sbastien Boisvert of Ray fame.
Polychaos dubium may have the largest genome known for any organism, consisting of 670 billion base pairs of DNA.
What is Polychaos dubium? It is only an Amoebae. We love the latin name !!
For those looking for more serious information,
(ii) Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation (h/t: @lexnederbragt, @larsgt).
Question they asked -
We decided to test whether any primed DNA molecules, lacking any other features of a PacBio SMRT bell, could be used directly in a sequencing reaction.
Answer they got -
The present efficiency of this process, in terms of the numbers of reads generated and Mb yield per SMRT cell, is considerably less than that using standard libraries. With standard methods a typical SMRT cell will yield 35,00050,000 reads and 100160 Mb of mapped bases. The direct sequencing method described here has generated up to 3000 reads per SMRT cell and therefore its utility is limited to small genomes. However, this approach enables one to acquire sequence data from comparatively low amounts of DNA, even less than 1 ng of input, and within eight hours from receiving the sample.
Simple enough?
(iii) BEETL: Burrows-Wheeler Extended Tool Library (h/t: @ctitusbrown)
BEETL is a suite of applications for building and manipulating the Burrows- Wheeler Transform (BWT) of collections of DNA sequences. The algorithms employed in BEETL are intended to scale to collections of sequences containing one billion entries or more.
The initial release implements two flavours of an algorithm for building the BWT of a sequence collection - BCR and BCRext.
Subsequent releases will add functionality for efficient inversion and querying of BWTs.
(iv) Matlab Code for Burrows Wheeler Transform (h/t: @mike_schatz)
An excellent source for students interested in learning how the search algorithm works. Readers may also check the animations developed by us to understand the transform part.
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
(vi) VcfView - a good graphical software for viewing vcf files (h/t: Nick Loman)
(vii) An Interesting Soil Metagenome Paper in PNAS - Cross-biome metagenomic analyses of soil microbial communities and their functional attributes (h/t: @genetics_blog)
For centuries ecologists have studied how the diversity and functional traits of plant and animal communities vary across biomes. In contrast, we have only just begun exploring similar questions for soil microbial communities despite soil microbes being the dominant engines of biogeochemical cycles and a major pool of living biomass in terrestrial ecosystems. We used metagenomic sequencing to compare the composition and functional attributes of 16 soil microbial communities collected from cold deserts, hot deserts, forests, grasslands, and tundra. Those communities found in plant-free cold desert soils typically had the lowest levels of functional diversity (diversity of protein-coding gene categories) and the lowest levels of phylogenetic and taxonomic diversity. Across all soils, functional beta diversity was strongly correlated with taxonomic and phylogenetic beta diversity; the desert microbial communities were clearly distinct from the nondesert communities regardless of the metric used. The desert communities had higher relative abundances of genes associated with osmoregulation and dormancy, but lower relative abundances of genes associated with nutrient cycling and the catabolism of plant-derived organic compounds. Antibiotic resistance genes were consistently threefold less abundant in the desert soils than in the nondesert soils, suggesting that abiotic conditions, not competitive interactions, are more important in shaping the desert microbial communities. As the most comprehensive survey of soil taxonomic, phylogenetic, and functional diversity to date, this study demonstrates that metagenomic approaches can be used to build a predictive understanding of how microbial diversity and function vary across terrestrial biomes.
(viii) Big News of Big Data Bioinformatics World - Amgen Bought ‘Once-Fallen Decode Genetics for 400M Cash. Here is an analysis of what went behind.
(ix) My Genome, Unzipped.
Joe Pickrell got his personal genome sequenced and made the surprising discovery that his genome has no surprises :)
A different set of surprises were in store for geneticist Daniel MacArthur, who found Asian signatures in his chromosomes.
(x) Irony of the day
Few hours later -
(xi) Does Whole Genome Sequencing Circumvent Gene Patents?
An interesting discussion on legal issues related to genes and genomes. The legal debate about gene patenting has been going on for long time with Myriad Genetics trying to patent two breast cancer-related genes.
(xii) Finally, this, ladies and gentleman, is what a balti is :)
There has been a conference going on in UK, where the leading bioinformaticians from US and UK are discussing the most challenging aspects of NGS data analysis. Please check twitter hashtag #baltibio for latest information.