These days, many biologists are performing RNAseq and other NGS experiments. The immediate challenges after collecting the data are (i) where to store them, (ii) where to analyze them and (iii) how to give access to all lab members in an efficient and secure manner.
The question of security gets important, when a lab has multiple members with different levels of expertise. In a shared account, a novice member may end up deleting valuable data, run a resource-intensive program to slow down others and so on. Even if you exclude the novices, you cannot avoid dependency hell. For example, two “experienced” members may install two versions of a program and break some analysis pipelines.
I have been maintaining my own servers for over a decade now, and need to give access to various students from time to time. Through trial and error, I came to a solution that works perfectly and allows me to sleep at night without worries.
Also the solution is inexpensive compared to the alternatives out there. I call it “thousand dollar server”, because that is the average price I pay to get a reasonable server with ~70-100GB RAM and SSD disk from Ebay.
Why “Cloud” Does Not Work
From time to time, others proposed Amazon and other clouds as solutions to me, but I found them suboptimal for bioinformatics. The main flaw is the pricing structure created to help the internet companies. You pay through your nose for continued access, more RAM, faster disk access and data storage. Especially, the NGS files are huge, and the storage fees add up.
My solution works for biologists, who get some linux expertise. Moreover, it can potentially add a major benefit, namely avoiding installation pain for bioinformatics programs. I will discuss all those details in the tutorial.