A Bioinformatics Study Guide for the Biologists - (i)
Increasingly all biologists and biochemists are feeling the need to learn bioinformatics. The required skill-sets go way beyond being able to run BLAST searches at NCBI or find information on genes and genomes from the online databases. Believe it or not, doing those tasks used to be called “bioinformatics” in biology departments a few years back. That situation changed with next-generation sequencing. Now that sequencing is so cheap, every lab has tons of raw data sitting in their hard-drives and they need help in their analysis.
Ok. Now that you are convinced you need to learn bioinformatics, where should you start? Here is my advice.
-
Stay away from the “bioinformatics” conferences, journals and papers and also avoid “bioinformatics” classes offered by computer scientists. Those courses are designed “by the computer scientists, for the computer scientists and of the computer scientists”. In fact, avoid them like plague, unless you are doing your research on plague :)
-
Programming language: First task you need to accomplish is to learn a programming language. You probably saw the paper that claimed that GO was the best language for “full-fledged bioinformatics”. What should you do? Yes, stay away from Go and also stay away from other two mentioned languages (C++ and Java) for good measures.
Once you go past the noise of everyone pitching a different programming language, your list will narrow down to three - PERL, Python and R. I used all of them for years and some for decades. I also have been writing programs in C since 1990. In my opinion, R is the language you should learn. R will get you the most productive in the least amount of time. It is so easy that even middle-schoolers of my class pick up R rather quickly. R is where you get the most value out of your learning efforts.
Speaking of value, we offer a free two-hour remotely taught module to get you started on R. This class remove your initial fear of coding, if that happens to be your biggest stumbling block. We have a class scheduled on this Saturday, if you like to join.
- dplyr: After you know the basics of R, you need to learn some of its powerful libraries. Those libraries (i.e. free programs provided by the users) make R so powerful.
The one I recommend the most is ‘dplyr’. It will allow you to do everything that you currently accomplish using Excel, and a lot more. Given that Excel is a popular analysis tool in all biology labs, learning ‘dplyr’ will allow you to practice your R skills and thus become more effective.
In the following post, I will discuss the next three learning steps in your journey into bioinformatics.