We encourage readers to follow hashtag #baltibio in Twitter to learn about the discussions at an exciting meeting of bioinformaticians in UK. Following topics stood out:
Prokka from Torsten Seemann
Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad- core computer. It produces GFF, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.
Few #baltibio claims: (h/t @pathogenomenick)
1. Around 50% of any random genome can be annotated against Prokka’s core proteome db which has 70k annotations with exp evidence.
2. The NCBI non-redundant protein database is actually quite redundant, and contains plenty of garbage as not policed.
3. NCBI PGAAP annotation tool can take weeks to get results back, in one case took six months!
4. Torsten uses Shrimp for read mapping in Nesoni as he finds it much more sensitive for indels than eg BWA.
TGAC - the Biggest Bioinformatics Data Analysis Powerhouse
Rob Davey presented on TGAC and you can see his slides here.
1. UK’s TGAC has a UV2000 with 2560 cores, 20TB RAM as well as nearly every instrument known to man !! @pathogenomenick.
2. TGAC get 25x speed improvement running bwa in an FPGA. @scalene
3. TGAC came up with an open source browser that is supposedly very good. h/t: @lexnederberg
TGAC Browser is a new open-souce Genomic Browser developed at The Genome Analysis Centre (TGAC) to visualise genome annotation from Ensembl Database Schema.
There were many other talks and discussion that you will find at Twitter #baltibio. We will cover Jared Simpson’s SGA in a separate commentary, but please check this @storify link to know about #baltibio discussions on his work.