Every year, governments spend millions of dollars for genomic sites and databases. They come in all kinds of forms - wiki, genome browser, search engine, biomart and sometimes places to simply dump large amount of data. From a very top level view, all of them serve one purpose, namely to provide information about various aspects of genomic data. With that definition, we need to also throw in the journals and web-based online forums into the mix. In fact, as you have seen from Scott Edmund’s comment in the last thread, GigaScience journal itself is hosting large amount of bulk data. Similar task gets done by NCBI GEO or SRA for US-based journals.
About three weeks back, we started looking into various biological websites and databases to learn what programs they use. After struggling with all kinds of websites built with home-grown add-ons and so on, we decided to proceed more systematically. We ranked all databases and information sites according to their usage, and then investigate them from top to bottom. However, our original mission got sidetracked after we made a startling discovery during the ranking process (based on alexa). We found that upstart web-forums like biostar and seqanswers have similar or better traffic ranking than decade-old flybase, yeast database (SGD) and arabidopsis database (TAIR). You may say that it is unfair to compare boutique shops like model organism database with discussion forums, but, we thought such comparison would help genomic websites to find ideas about how to get the community involved, and make the annotation tasks easier. In the past, we had some bad experience of building a genomic database, where the manager of the project treated it like his personal fiefdom and was very much against getting the community involved ( “Why are you contacting our competitors? You will only upload data generated by our group !”, “We do not want to put a wiki for genes, because nobody has time to add any text there”, etc.).
We got curious enough about Biostar to contact Istvan Albert to learn more. Istvan was in a good mood, partly because Biostar reached 10,000 question mark around the same time, and entertained our questions. We seriously underestimated Istvan at first. We thought he installed the program from somewhere else, and even after learning that it was written by ‘biostar developers’, we assumed that the young kid in the team wrote the entire code. That assumption was incorrect and we came out very impressed after talking to him. Wish we did our homework and knew that he used to write PRLs in his past life (arxiv here).
Our curiousity about Biostar did not stop there and we sought opinions from few other extensive users, non-users and quitter. We even made a pseudo- account ourselves to ask questions and collect points. There was a complaint at stackoverflow about Biostar people being rude, and we wanted to find out how the forum feels to a newbie.
In the next few commentaries, we will be writing about Biostar and then continue on to discuss other biological databases. We opened a new section in our blog for those information-related websites, whereas this space will continue to be used to write about algorithms. Biostar will be covered in four parts, because it is intriguing in many respects.