The most influential PubMed articles of the past 5 years (December 2016 Edition)

James☃️Pirruccello
2 min readDec 31, 2016

--

Computing over the entire PubMed dataset, here is an assessment of the most influential journal articles published in past 5 years.

How the list was produced

To produce this list, I took all of the citations in PubMed (a citation being a reference from one article to another). I then iterated over this in a random walk, tallying the number of times each article was encountered. This approach integrates both the number of citations an article has, as well as the influence of the articles citing each article. (An influential article will be reached by more walks, and therefore the articles that it cites will become more influential as well.) Performing a minor normalization step, we get a weight factor that reflects the article’s influence on subsequent literature. The top paper, with a weight factor of 54.12, can be thought of as being 54 times as influential as the average journal article.

Why not just 2016?

It is tempting to produce a list of the top papers from 2016, but I think that the scientific community moves slower than that, at least in a way that is observable through a citation network. So I’ve arbitrarily selected the past 5 years to focus on topics that are relevant to the modern scientist, without focusing on articles that are so recent that their influence scores fluctuate largely due to noise.

Top 10 articles from the past 5 years (2012–2016)

  1. Cancer statistics, 2012. CA Cancer J Clin 2012 54.12
  2. Cancer statistics, 2013. CA Cancer J Clin 2013 48.28
  3. NIH Image to ImageJ: 25 years of image analysis. Nat Methods 2012 43.93
  4. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 2013 35.58
  5. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012 35.58
  6. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 34.80
  7. An integrated map of genetic variation from 1,092 human genomes. Nature 2012 34.64
  8. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research — an update. Bioinformatics 2012 32.55
  9. Cancer statistics, 2014. CA Cancer J Clin 2014 31.97
  10. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012 31.26

Conclusions

This is a simple survey, so to conclude I’ll call out the broad-strokes patterns that I see:

  • Planned summaries, reviews, and guidelines
  • Methods papers (ImageJ, MEGA6, Bowtie2, GenAlEx)
  • Resources available to the community (1,000 Genomes project, ENCODE)
  • Clinical trials (therapeutic hypothermia after cardiac arrest, HIV prophylaxis, checkpoint blockade & immunotherapy)
  • Generally-recognized-as-hot research topics (CRISPR/Cas, Ebola)

Addendum: gaming the system?

There seems to be another group of papers, especially from Behav Brain Sci and Phys Life Rev, that almost seem designed to “game” rankings lists such as this. They are highly ranked articles which are on popular topics, and they have numerous citations from within the same journal and few from outside journals. This is reminiscent of link farms in the early days of Google. This seems to be an area ripe for further analysis.

--

--

James☃️Pirruccello

Founded @mychances with @pirruccello (acq by @parchment) | MGH Cardiology Fellow | Interests: computation & genomics