How to Download Abstracts of All Biology-related Papers for Text-mining
This morning, we have been scouring the web to find information on the best ways to download all abstracts in bulk from Pubmed. This commentary is on what we found so far, because others may be interested in doing the same task.
The simplest method is to go to the ftp site at NCBI and download everything. Please note that the abstracts are open access, but full articles may not always be so. You will need to sign a copyright form with NCBI to do any text-mining not covered by US copyright laws. Here you can read an interesting discussion on what is and is not allowed under copyright laws.
Those who like to do more sophisticated analysis can either use NCBI E-utilities tool or a cool software called textpresso. We also came across another tool called refnavigator, but do not know how good it is.
We found a biostars discussion thread to be very helpful to learn about various experts trying to solve the same problem as ours.
Finally, here is a cool way to do the above task using R.