Please Sign up for our Membership Section

Dear readers,

Over the last few months, several of you contacted us requesting updates on various sections of the tutorials. Despite all good intentions to keep them up to date, we have not been able to do so for lack of time. Some of you contacting us brought up the idea of membership that we proposed couple of years back. We like to give it a try to see whether the community benefits from it.

Based on readers’ emails and other discussions with various collaborators over the last year, we realize that they are facing numerous problems in using the latest and greatest bioinformatics programs.

(i) First, there are simply too many programs for solving every problem, with no obvious way to judge which ones are the most appropriate. You can get a flavor of that from yesterday’s post on Pacbio assembly.

(ii) The solution may be to install a number of programs and compare the results, but installation appears to be the next big hurdle. It is nice that almost all bioinformatics papers deposit their codes in github these days, but not all codes come with full documentation. Moreover, very few of them have been tested in multiple environments. We tried to install the programs from Pandora’s Toolbox for Bioinformatics in a different unix operating system, and only three compiled fully without any intervention. You may argue that bioinformatics programs should not be tried outside linux, but we chose to do so after experiencing some failures in existing programs after ‘upgrades’ in linux. This is a common problem, when shared libraries get replaced by new versions.

(iii) Then comes the problem of scaling through RAM. In conversation with others, we find many researchers still using that method for their data analysis. They tend to stick to the programs they know best, because doing otherwise involves a steep learning curve. Therefore, when they encounter massive amount of data, the obvious solution is to tackle with larger machine with massive amount of RAM ( ~1 TB!). We wrote extensively about this expensive non-solution in 2012 and 2013, but still hear about it being practiced.

(iv) Another problem we often hear about comes from the incompatibility of output format. For example, a reader asked for help on solving an assembly problem, and we recommended a set of programs to be used as a pipeline. Unfortunately, the reader could not use the combination, because the output of program X could not be fed directly into program Y. This may sound trivial for those coming from computing background, but may not be as trivial for biologists.

(v) The last problem we came across was a bit different from the other ones listed above, and let us call it ‘bioinformatics overuse’ for lack of a better name. If you heard about the perils of antibiotic overuse, wait till you see a case of ‘bioinformatics overuse’ !! In this case, the users are able to install various programs, but do not know what is inside those black-boxes. Therefore, they decide to use every single one of them, and then come up with some logic to combine the output. I even came across suggestions of ‘using the programs in triplicate’ just like using three experimental samples !!

We plan to address all those issues in our membership section, but the first task is to present information already posted at various parts of this site in a well-organized manner. Also, we are testing some very interesting ideas to help you learn programs without losing hair from installing them. The membership site is currently undergoing tests and will be open in a month. If you enjoy our blog, please sign up for the membership here.

Capture

‹»Apache Spark« »A survey of best practices for RNA-seq data analysis«›