Bioinformatics in the AI-era
In 2011, I wrote two articles (here and here ) providing beginners’ guides to bioinformatics. Eight years later (2019), I posted an updated guide here. Now that AI has become a powerful tool, it is time to discuss how the work of bioinformatics and computational biology is changing.
Before continuing, I will mention that three distinct activities fall under the broad term “AI”. They are - (i) using web-based text engines like Chatgpt, Claude or Gemini and their extensions as coding tools, (ii) downloading numerical models directly from Huggingface and building applications on top of them, and (iii) developing and training mathematical models for new kinds of uses (such as Alphafold for protein-folding and Evo and Evo2 discussed here). All of them are applicable to biology, and I will discuss them in multiple posts. Here, I will focus on the first one.
(i) AI as a Text/Code Generation Tool
When most people say AI, they refer to the software installed in web-based platforms like Chatgpt, etc. These tools generate texts from given prompts, and they have become replacements for traditional search engines. Mathematically they implement a specific algorithm called Large Language Model (LLM).
An extension of this technology comes in the form of coding tools like Claude code or Codex. They are having huge impact on the software industry. You will find hundreds of articles and videos claiming both “software career is dead because of vibe coding”, and “vibe coders are destroying humanity by generating AI slop and hallucination”. I wrote about my personal experience using AI for coding.
Bioinformatics is different however. I am finding AI-LLM as an amazing tool for all aspects of bioinformatics.
a) Scripting:
Unlike software professionals, computational biologist aim to analyze biological data using code. Therefore, data takes precedence, and we often need to write custom scripts based on observed patterns in the data. This is where AI tools excel. My first successful use of chatgpt in coding was in trying to write a ggplot customization. I could spend countless hours looking into manuals and tutorials, but chatgpt created the exact code I was looking for in the right form.
b) Running specific packages:
Bioinformatics-related packages are not always well-documented, and often the inconsistency between packages can confuse you. For example, some packages require genes as the first column of data frames and others need them as row-names. Also sorting out the differences between matrices, dataframes and other types of data in R takes forever. With AI tools, I can quickly get an working example code and then start iterating on it. Also, if the code does not work, post the error message in the same chatbot and you often get a working fix.
c) Explaining math behind the code:
Scripting and finding tutorials on packages are how most people are using AI tools. They can also help you develop conceptual understanding of the math and logic behind those packages. I will work on a few examples in a later post.
d) Developing new and derived applications:
This is closer to software engineering than using tools developed by others. Let me talk about this topic in a later post.
Also in future posts, I will write about the other two aspect of AI, namely using models direcly from Huggingface and and developing new math or models for biology.