Romance novels and beginners’ guides on computing languages are never in short supply. Why create another set of tutorials on popular programming languages, when similar information is freely available from hundreds of websites?
This set of tutorials is unusual in its approach. The focus is less on one or other language and more on actual problem solving. In section 2, we present a bioinformatics problem from a recent paper related to next-generation sequence analysis. In sections 3-6, we provide brief introductions to R, PERL, python and C/C++. In section 7, we describe various computer science-related concepts, such as data structures, algorithms, functions and classes, essential for problem solving. We also explain the predominant computing architecture deployed in all commodity machines. In section 8, we apply the knowledge gained through previous sections to solve the proposed biological problem. Section 9 discusses ideas on making the code fast and efficient. They were mostly unnecessary three to four years back and information presented up to section 8 would have been enough for most bioinformaticians. That seem to have changed with advent of NGS analysis and it has become as important to solve a problem as it is to find a
Our presentation is motivated by the observation that new bioinformaticians often ask about which programming languange is the best for bioinformatics. Such questions tend to distract one from the fact that the biggest effort in solving a problem is spent on understanding the scientific question and in designing appropriate algorithms and data structure. Those, who do such tasks well, are usually not constrained by the specifics of various programming languages. Moreover, programming languages themselves are evolving, and each one incorporates bright ideas popularized by others at a rapid pace.