Question from a Reader on Studying Bioinformatics

Reader S. asked:

First off, I would just like to thank you for creating Homolog.us. It has been an invaluable resource in my bioinformatics education and a source of enlightenment.

I’m an undergraduate student majoring in neuroscience and psychology, but I have taken a keen interest in bioinformatics over the past year. My small liberal arts university does not offer any bioinformatics courses so I’ve been taking as many computer science and biology courses as my schedule will allow. I was hoping to get your opinion on furthering my education in bioinformatics. That is, what do you think of the two-year master’s programs offered by Johns Hopkins and other universities? A comment on slashdot, and included on OpenHelix’s Friday SNPets, remarked that “these students graduate with trade skills that have a short shelf life and lack the proper foundations to gain new skills. In that respect, there’s a bubble, unfortunately”.

Do you agree with this assessment? I understand the difference between the type of jobs obtainable with Ph.D and a master’s, but do you think that these programs are cranking out more, ill-equipped bioinformaticians than the market demands?

Also, my less than stellar GPA is the reason I am interested in applying to a master’s program. Could a master’s degree in bioinformatics be useful as a stepping stone to a Ph.D in systems biology or similar field?

Hello S,

We used our Google English-to-English translator with the question - ‘is there a bubble in bioinformatics today’, and it translated to ‘will someone joining a bioinformatics program today readily find a job in two years’. That required us to look at our crystal ball and we got three answers -

a) Stay out of debt,

b) Learn Chinese,

c) Play contract bridge !!

Our crystal ball is often very cryptic :). Let us try to interpret what it said.

What role does bioinformatics play in biology?

To find out, whether the newly minted MS students are properly-trained or not, we need to first understand what the bioinformaticians do or are expected to do in coming years. The word bioinformatics covers a broad range of topics, but it essentially means doing genetics at a very large scale. As a simple example, if you want to translate one gene into protein following the genetic code, you can do it by hand. If you plan to do the same for 10,000 genes, you have to write a computer program. If you want to find all binding sites of the form ‘ATGCCA’ in the human genome with one of six bases varying slightly, you absolutely need a computer.

Writing computer program does not come naturally to most people and some training is needed. Writing efficient code is even more difficult, because that requires one to know quite a bit about how computers are built and how data flows inside them. The fields of genetics and molecular biology are also well-developed and require extensive training. Throw in maths (algorithms), statistics and physics (nonlinear dynamics) in the mix, and you have your plate full before trying to become an expert in bioinformatics. That explains the frustration of Chris Mueller, who wrote the slashdot comment posted by you, about newly-minted bioinformaticians. I also agree with his sentiment about ‘short shelf life’ of training. Technological innovation is changing the field so rapidly that someone who was an expert yesterday (say in array technology) may feel quite behind tomorrow, when next-gen sequencing takes over.

A good teacher of bioinformatics needs to have strong appreciation of many aspects of the area, whereas someone learning bioinformatics from a computer scientist dabbling in biology or a biologist writing few python scripts will have narrow exposure to the field. Please keep those in mind in choosing a program, and check the background of the professors.

b) What is the long-term future of bioinformatics?

No matter how difficult bioinformatics seems from the above description, it is the future of biology. The reason is simple. Geneticists of earlier days resorted to studying one or two genes at a time due to technological limitations. With availability of complete genome sequences and cheap technology, there is no excuse for not looking at all relevant genes together, and that means future biologists will have to cope with large amount of data.

The transformation we are seeing in biology departments is similar to what happened in electrical engineering departments 30-40 years back. Electrical engineers used to focus on electric power generation and transmission, when some of them invented semiconductor electronics and integrated circuits with help from physicists. Today, the old-school electrical engineering has been relegated to small parts of the departments. Not only that, the electronics technology became so useful that it helped create new multidisciplinary fields such as computer science and biomedical engineering.

I mention electrical engineering, because the technologies revolutionizing biology today are essentially marriage of semiconductor technologies and genetics. So, the basic driving forces are similar in both cases.

In a nutshell, bioinformatics is not going away. Instead, every biologists of future generation will have to be a good bioinformatician.

c) How do you explain the recent urgency in hiring of bioinformaticians?

There are two driving forces -

i) Pent-up demand: Most older generation biologists, who control decision- making in universities and companies, underestimated the significance of bioinformatics and its data-driven approach to biology. They mostly stayed on the periphery during the human-genome sequencing project, but enjoyed (and squandered) increase in research funding resulting from the success of the project. Their approach had been to get preliminary analysis done by a technician or core facility, and then getting processed numbers on an Excel sheet for further analysis. Amount of data, in the meanwhile, continued to grow exponentially, and at some point, it overwhelmed Excel. That breaking point came with next-gen sequencing.

ii) Western demographics:

If you break down the population of Western countries by age, you will find a huge peak around ages 55-58 and another one around 19-25. Those peaks represent the boomer generation and their children. The boomer generation expects to go into retirement soon, which means there will be strong demand for medical technologies, and many companies are investing in anticipation. Those investors see genomics as the biggest source of new medical applicaions, and genomics is increasingly being data-driven. Hence, there is a strong need for bioinformaticians.

d) Will supply overwhelm demand over the next 2, 4, 6 years?

You are asking a hard question. Let us list all positive and negative factors for you to assess, because we do not have an yes/no answer.

i) Older biologists continue to underestimate the significance of bioinformatics, and think that they can handle all data with few high-quality technicians. We will not believe that the sentiment changed until we see Eugene Myers get a Nobel prize. From that viewpoint, the ‘market’ is far from saturated. Looking from another angle, given how data driven biology is going to be in the future, your university and many other similar ones will have to offer bioinformatics courses. So, I see future job openings :)

ii) The underestimation of bioinformatics by older biologists and their urge to push all data analysis to ‘core facilities’ is likely a driving force in creation of easy bioinformatics courses at many universities. The ‘market’ expects people running software programs without in depth knowledge about algorithms, and universities are producing what the market demands. We expect that sentiment changing and good bioinformaticians to gain respectability.

iii) We expect the bets on US demographic trends to go bust, because USA is many times broke and retiring boomers will have much less to pay for old-age medical care than they think they have. At all levels of US society, we are seeing huge efforts to hide the truth about devastating financial losses by taking on more debt. Boomer generation is hoping that if they can maintain the status quo for another few years, they will escape the consequences no matter what. Unfortunately, they are out of luck, and USA is going to collapse like Greece or Spain right when they try to retire. That is why staying out of debt becomes important, because debt during an economic collapse is like jumping in water with a huge rock tied to your feet. Can you afford ~$50K or more fee for masters in bioinformatics at Johns Hopkins without going deeply in debt? Do try to get a scholarship.

iv) Many smart people around the world are aware of pending demand due to new genomic technologies and changing demographic trends, and they are executing accordingly. The competition is global and USA seems to be falling behind despite having immense technological edge. Eugene Myers left for Germany. Lincoln Stein went to Canada. Pavel Pevzner formed a very good research team in Russia. Chinese came from nowhere, and is now leading the race for NGS.

Going forward, we expect USA to go deeper and deeper into debt to maintain its grotesque military and corrupt financial structure. That means there will be less funding for education at all levels of society leading to lower quality of teachers, students, etc. If you become an expert in new biology, expect more opportunities in countries with better capital position. That is how capitalism works.

e) Getting back to your question about two-years masters programs and PhD

If you asked us 10 years back, we would have said join an university with very good professors working on cutting-edge research, and make sure those superstar professors actually teach masters level classes. Today, you can use the internet to ‘hire a team of scientists’ from all over the globe and start your learning. Many young and talented professors are computer savvy and post their class-notes and lectures online. You can join their twitter feeds and find out which papers they are discussing about right now. It may be hard to follow all papers right away, but one benefit bioinformatics has over other areas is that the field is young, and you do not have more than 5-10 years of history to follow up in most cases. If you do all those before joining a masters or PhD program, that will help you gain the most from the curriculum.

When you read the papers, try to understand -

(i) the social value of the project [such as who benefits and whether he is ready to pay $50, or $500 or $5000 for the proposed cure],

(ii) the line of thinking that led the researchers to come up with the solution.

That is the best way to get value out of papers, because if you start thinking like the person writing a paper, sooner or later, you may be able to think one step ahead. That gives you strong position to start a conversation with him, and eventually be a part of his research team. Internet gives far more access to knowledge than what were possible by learning directly from scholars by attending courses from them in physical presence.

How does playing contract bridge help? Let us leave that discussion for another day.

-——————–

On using the internet as your classroom, here is an interesting article on Coursera forwarded by C. Titus Brown with following remark (“Good news! Becoming a data scientist is easy! http://gigaom.com/data/why- becoming-a-data-scientist-might-be-easier-than-you-think/ (only partial sarcasm - coursera and kaggle ++)”).

Why becoming a data scientist might be easier than you think

Several novice programmers who signed up for a free machine-learning class on Coursera have gone on recently to win predictive-modeling competitions. Maybe its not that hard to mint new data scientists after all.

‹»Why Core Facility Model Adopted by US Universities is a Bad Model« »De Bruijn Graph of a Palindromic Sequence«›