PacBio Data on Drosophila Release from Bergman Lab

PacBio Data on Drosophila Release from Bergman Lab

PacBio enthusiasts have a whole new data set to play with. In July, Bergman Lab released -

PacBio Whole Genome Shotgun Sequences for the D. melanogaster Reference Strain

As part of a collaboration with Danny Miller and Scott Hawley from the Stowers Institute, we have generated whole genome shotgun sequences using PacBio RS technology for the Drosophila melanogaster y; cn, bw, sp strain (Bloomington 2057), the same strain that was also used to assemble the D. melanogaster reference genome sequence. Weve been meaning to release these data to the community since we got the data in April, but have been waylaid by teaching commitments and a spate of recent server problems. Prompted by Dannys visit to the Bergman Lab over the last two weeks and the generous release by Dmitri Petrovs lab of a data set of Illumina long reads using the Moleculo technology for the same strain of D. melanogaster, weve finally gotten around to bundling these D. melanogaster PacBio sequences for general release. Were hoping that the availability of both PacBio and Moleculo long-read data for the same strain that has one of the highest quality reference genomes for any species will allow the genomics community to investigate directly the pros/cons of each of these new exciting technologies.

Today, they released -

Error-Corrected PacBio Sequences for the D. melanogaster Reference Strain

Using PacBio and Illumina whole genome shotgun sequences we recently released for the D. melanogaster reference strain, Sergey Koren and Adam Phillippy at the University of Maryland have recently run their pacBioToCA method to generate a dataset of error-corrected PacBio reads for this dataset, which the have kindly made available here for re-use without restriction. This pilot data set is not at high enough coverage and thus a whole genome assembly was not attempted. Nevertheless, both the raw and error-corrected datasets should be of use to better understand the nature of PacBio data and the pacBioToCA pipeline as applied to Drosophila genomes.

One obvious question is stated with the first release. The data sets can be used to compare Moleculo long reads with PacBio reads to see which technology performs better. We started working on that question about three weeks back, but had to put the analysis on hold to address other pressing needs.

Data release policy from Bergman lab -

As with previous unpublished data we have released from the Bergman Lab, we have chosen to release these genomic data under a Creative Commons CC-BY license, which requires only that you credit the originators of the work as specified below. However, we hope that users of these data respect the established model of genomic data release under the Ft. Lauderdale agreement that is traditionally honored for major sequencing centers.

We do not know what Ft. Lauderdale agreement means in the context of blog posts. We asked a similar question over a year back, but nobody replied.

Written by M. //