Why citation counts don’t matter

Allow me to pull back the curtain. Scientist #1 is writing a paper and wants to add a reference in the introduction. They do a quick PubMed search and select one from a prestigious journal that looks applicable without reading the paper. Or even worse they do a Google Scholar search and just take the top hit (hits are sorted by citation counts).

Scientist #2 comes along and reads the paper by Scientist #1. They see the reference Scientist #1 used to support a fact and decide to use that same reference for the paper they are writing without reading the reference. Now two scientists have cited a paper without reading it.


Most citations occur in the introduction/background of a paper and are only tangentially related to the work of the paper. Most work that gets cited is not read and a certain fraction of citations are incorrect — a citation might actually provide evidence to the contrary of the result it is being referenced to support!

Why and how does this happen?

The explanation is pretty simple. There is simply no reason not to cite as many papers as possible, and there are some possible benefits. Although it may be the case that your work really only builds on a few papers it would look “unprofessional” to only have a few references. Bigger is better, and the more references a paper cites is clearly indicative of how much work was put into the paper. In addition, when you cite someone’s work maybe they’ll notice and return the favor. You scratch their back maybe they’ll scratch back.

If this were true then why don’t papers have hundreds of references?

Perhaps people tried. Journals have a cap on the number of references you can have. For example, Nature allows up to 50 references. I’m not sure what the average reference number is, but in the life sciences it is probably between 30 and 40.

But references are the currency of science and citations help identify papers which scientists believe are true.

Yes…and no. A high citation count simply means the paper was interesting. Maybe interesting isn’t the correct word, provocative might be better…basically just something that grabs people’s attention. Whether it is vaccines that cause autism, water that has a memory, magical stem cells, arsenic DNA, or non-Mendelian genetics, it is easy to find examples of papers with very high numbers of citations that should have been printed in the National Enquirer rather than a scientific journal.

But high citation counts show which papers are the most important in their field and made the largest advances.

Maybe. This gets back to the issue of what should be cited in a publication. Almost every biomedical lab uses PCR which is essential for their research, but do they cite PCR? Of course not, PCR would have a million citations by now if they did. Do you know who cites PCR? People writing a paper about PCR.


How about we take a look at a practical example of a paper which is technically sound, yet completely useless, but yet grabbed people’s attention and might receive a good number of citations. I have no problem throwing shade at this paper because I think the author himself regrets the misconceptions that resulted from all the media attention. Personally I would have just posted it on my blog instead of sending it for publication.

The paper in question was published in Genome Biology. Genome Biology and Genome Research are basically the Kim Kardashians of computational biology, they have the highest impact factor because they publish a lot of stupid shit that gets people’s attention. Genome Biology even published a paper about something called a “Kardashian index”. No, that’s not a joke, that really happened.

Anyways, back to the paper I was going to criticize (I got sidetracked with another paper that would have also worked just fine). These researchers looked at the supplemental Excel files of some publications and noticed that some gene names had been converted to dates. This is a known problem. Anyone who pastes data into Excel knows that Excel will fuck it up, and in fact someone published a paper about this exact problem all the way back in 2004.

Despite this the paper has an Altmetric score of 1534, with 23 news outlets, 15 blogs, 1561 tweeters, and 22 Facebook pages.

Why so much attention? Because people drew the conclusion that scientists were studying the wrong genes in their publications and it seemed to occur more often at prestigious journals.

But scientists are not studying strange new genes that have a date as a name (or at least not most of time). What is happening is when a researcher submits their paper to a journal they’ll include some extra information as supplemental Excel files. To do so they’ll just copy text from a tab separated file and paste it into Excel. And then Excel will fuck up some of the text and cause the resulting “errors”.

These errors do not affect the publication and in fact are not even in the main text of the publication. Scientists did not study the wrong genes because of this error. I don’t even know if you can even call this an error because often along with a HUGO gene symbol scientists will include an Entrez ID or Uniprot ID or transcript ID, all of which can be used to identify the gene in question. So there was a bunch of ado about nothing.

Wait, this pointless paper really got that much attention?

Yes, people will talk about what they want to talk about, and in this case they wanted to believe the literature was filled with expensive studies felled by a common Microsoft product. Many scientists are really no better than high school gossipers, and just as the popular kids weren’t exemplary students neither are many highly discussed and cited papers.

So what’s the solution?

If we are going to use citations as a metric for how trustworthy a publication is people should only cite articles which they have read and which are directly applicable to their work. No more long introductions that are basically miniature review articles.

This will decrease citation counts in general, but most importantly it will dramatically decrease the citation counts and impact factors of prestigious journals, eliminating their prestige and all the problems they cause.

But how will we know which are the most important papers in a field?

To determine which publications are most important for a field perhaps we should only rely on the citations of review articles. Therefore there will be two different types of citations.

There will be citations from research articles where the authors are putting their scientific reputation on the line in their support of the reference. If you read a paper about water memory and then cite it as something essential for your research you will suffer the consequences. And then there will be citations from review articles where identified experts in the field discuss the papers which they found to be the most influential.

Let’s try to eliminate the Kim Kardashians from science.