Note: These tutorials are incomplete. More complete versions are being made available for our members. Sign up for free.

Introduction

These tutorials are written for hundreds of bioinformaticians trying to cope with large volume of next-generation sequencing (NGS) data. NGS technologies brought a dramatic shift in the world of sequencing. Merely five years back, genome sequencing of higher eukaryotes used to be very expensive endeavor. To get a genome of interest sequenced, hundreds of scientists had to raise funds together by writing a joint white-paper and petitioning to various government agencies. The tasks of sequencing and assembly were handled by dedicated sequencing facilities, of which only a few existed around the globe. Naturally, the capacities at those sequencing facilities were significantly constrained from high volume of requests.

Recent technological breakthroughs democratized sequencing capabilities by giving access to high-throughput instruments to many research institutitions and even some well-funded individual labs. Today, any scientist can get more nucleotides than the human genome sequenced within a week from a local facility at a reasonable cost. This democratization of sequencing ability created downstream problems for sequence mapping, assembly, analysis and even data storage. In previous arrangements, genome assemblies were done by few dedicated groups associated with large sequencing centers. Those dedicated groups still remain the leaders in research on sequence assembly, but biologists from many other institutions are venturing into assembly to make sense of locally acquired short-read libraries.