A Bird's Eye View of NCBI GEO Database - II

A Bird's Eye View of NCBI GEO Database - II

We presented the structure of NCBI GEO database in our earlier commentary - A Bird’s Eye View of NCBI GEO Database. Today we will inspect the contents of GEO database more closely.

As we explained earlier, GEO data sets are organized in terms of both GPLs (platforms/array design) and GSEs (collection of many measurements on one or more array designs). As an example, GPL570 is the human gene array designed by Affymetrix. At NCBI GEO database, all experiments using the above array can be downloaded together from their GPL570 link. On the other hand, a GSE ID typically represents all data from a researcher related to a publication. That GSE file may include any number of platforms (GPLs) depending on how the experiment was designed.

In the following chart, we show the most popular GPLs, i.e. the ones used by the highest number of GSEs. Please click on the chart to see it in a larger form. GPL570 is clearly the winner closely followed by GPL1261 (Affymetrix mouse array). Each of those arrays was used by over 1,000 publications. GEO also assigned single GPL IDs for all Illumina short read submissions for each organism. Those sets (GPL9052, GPL9058, etc.) are catching up fast given their limited history.

Data in the above chart is from April, but on our website, we added a new place to continue to show these charts with the latest data. You can get there by clicking on the ‘Trends’ link at the top of our page. In the coming days, we will add many other trend charts for data in GEO and SRA databases.

We wanted to also find out the historical trend in use of the popular GPLs. You can click here to access the charts, or go to the trends section and check the second link. You will find that GPL570 and GPL1261 are gaining strength over the years, whereas older arrays such as GPL96 and GPL81 were used mostly in the earlier years. GPL6629 (Affymetrix fly tiling array) is interesting, because it reached top place in usage even with three years of history. 2011 data in all charts are from April. We will rectify that this week by loading the most recent data, and then continue to update almost daily.

If you are interested in checking usage of your favorite GPL that is not included above, please click the ‘search’ button on our top menu and type the GPL ID you are interested in. For example, you can type GPL9052 to find the trend in NGS sequencing using Illumina technology for human transcriptome.

Written by M. //