The problem with journal level metrics

Image for post
Image for post

“Okay, now let’s look at the next candidate.”

Documents appear on the screen, and all the members of the tenure commission start to look at them.

“Seems like he has been pretty successful. He has published several articles,” mentions the first member.

“He has supervised a PhD student and, from what I’ve heard, he has also been a good teacher,” confirms a second member.

“Hmm… I’m not so sure,” says a third. “Look at which journals he has published his articles. None are top journals.”

“But isn’t Journal of Materials from Nature a top journal?” asks the first.

“Nope.” She types quickly on her keyboard and shows the others. “This journal has an impact factor of only 0.46.”

“Ah, the word ‘nature’ was in the name. I guess that’s why I thought it was a good journal,” says the first.

The chair of the commission speaks up. “Have we already made a decision or would you like to continue discussing?”

“No, I think it’s pretty clear. He has tried hard, but we have to find top scientists,” answers the first.

“Let’s look at the next candidate then,” says the chairperson.

One member starts to protest a bit, but it is too late. New documents appear on the screen and the discussion moves on.

Tenure decisions are probably not made this way, but maybe we imagine something like this when we start worrying that our list of publications is not so outstanding. And, it is understandable that we worry from time to time (or even a lot) because we have to compete for funding and positions. It seems that our results are measured and compared more and more to find those so-called “top scientists”.

Some sort of comparison is necessary because we have to distribute a limited amount of research funding, but I think we all recognize that something is wrong when a scientist is evaluated based on journals and not on his or her own work.

Even though Garfield initially proposed the impact factor (IF) mainly for making decisions about which journals a library should subscribe to, it has somehow still found wide use in evaluating scientists. We all have probably noticed to some extent that the impact factor is not a good metric for this, but let’s take a closer look at those problems:

  • Impact factor is the mean, but the distribution is skewed
  • Impact factor depends on the field
  • Impact factor is manipulated

This is an elementary statistics error. The average should describe the overall sample, but if the distribution is skewed, then some other average should be used instead of the mean, such as the median. If the distribution is strongly skewed, then the mean is more a description of those few exceptional articles that have received a lot of citations. Selgen showed almost 30 years ago that the distribution of citations among articles is skewed. Others have confirmed this. Some journals, such as PLOS journals, Journal of Materials Science, and Nature Chemistry, have also released their distribution (see also Lariviere et al. 2016). All are skewed. The impact factor is a mean; therefore, using it to describe the articles in a journal is incorrect.

Image for post
Image for post
2015 data. Source: Lariviere et al. 2016.

This is probably the most well known problem. If we don’t happen to be in the field of medicine or nanomaterials, then we usually feel that we have to say “but 3 is actually a pretty good impact factor in our field”. Additionally, it appears that the impact factor also depends on the type of article. For instance, that same data from the Journal of Materials Science indicates that review articles receive more citations. The impact factor does not take these types of differences into account, so it is difficult to see how it could be used broadly for comparing and evaluating.

Because a concrete number is presented, it gives the impression that it is measured objectively. Maybe that’s why it is a surprise to many that the impact factor can be manipulated relatively easily, and this is indeed done.

As a reminder let’s look briefly at the impact factor formula:

Image for post
Image for post

In other words the impact factor is calculated by taking all the citations in that year to articles that have been published in the preceeding two years. And this sum is divided by the number of citable articles that have been published in the preceeding two years. Citable is the key word. For the number of citations all citations to the journal are counted, regardless of whether an article is “citable” or not. However, in the denominator the only items that count are those which Clarivate Analytics considers “citable” (Clarivate Analytics calculates and publishes the impact factor).

One of the main ways to artificially increase a journal’s impact factor is to convince Clarivate Analytics that actually the journal doesn’t contain so many citable articles. The journal PLOS Medicine described how the process works when they received their first impact factor in 2005. In their conversation with Clarivate (Thomson Scientific at that time) impact factors ranging from 3 to 11 were discussed depending indeed on just how many articles were left out (in the end it seems they were given 3.8). One infamous example is FASEB Journal. In 1988 it was decided that the meeting summaries in the journal were no longer citable and the next year their impact factor rose from 0.24 to 18!

Some publishers actively attempt to decrease the number of citable articles in their journals. For example in 2002 Cell Press (a part of Elsevier) acquired Current Biology. So the same articles from 2001 were used for calculating both the 2002 and 2003 impact factors. In 2002 (before Cell Press acquired it) according to Clarivate’s data 528 citable articles were published. A year later there were suddenly only 300 citable articles. 2001 was already long over, so in both impact factor calculations the same articles were being looked at. It is simply that easy to play with the denominator of the impact factor. Current Biology’s impact factor had been 7, but after the acquisition (and apparently after some negotiations between Elsevier and Clarivate) their impact factor jumped and was 11.9.

And in general, since neither the Clarivate nor Scopus databases are openly available, it is difficult to know who is playing with the numbers and how equal (or inequal) different journals are treated. Rockefeller University Press tried to investigate that. They even purchased data from Clarivate because they suspected that their journals were not treated equally. Even with Clarivate’s data they were not able to calculate the same impact factor that Clarivate had given. When they talked with Clarivate about this Clarivate apologized and said that they use several different databases in their calculations. They said they had accidentally sent the wrong database and that they would send a new one. Rockefeller University Press calculated again, but the numbers still didn’t match up.

It is clear that the impact factor is not a reliable metric, and therefore, many other journal-level metrics have been proposed, which attempt to correct for the errors of the impact factor. But this brings up the key question: is it even possible to use a journal’s reputation to evaluate individual articles or scientists? In other words, can top articles be reliably distinguished from others based on the journal of publication?

The simple answer is no. Rousseeuw investigated what the likely proportion of top articles is in a top journal. He left out all the problems that interfere with journal-level metrics in the real world (such as those mentioned above). There was only one factor in his model: how well the editors and reviewers are able to distinguish top articles from other articles. If in this ideal situation top articles cannot be determined based on the journal of publication, then in the real world there is no hope of doing so.

Suppose a journal wants to select only articles that are in the top 20%. Let’s assume that 70% of the time editors and reviewers are able to correctly identify those top articles (in real life it is unlikely that anyone is so skilled). Rousseeuw showed that only 37% of the articles in this journal would be top articles!

Image for post
Image for post
Based on Rousseeuw’s article

Starbuck showed similar behavior with his model and real data have also confirmed it. The exact percent varies depending on how a top article is defined, but in general less than half of the articles in top journals are top articles. Perhaps on average articles in top journals are still somewhat better, but when you attempt to evaluate an article or scientist based on the journal then it is more likely that you will give a good evaluation to what is actually a mediocure or bad work. And data also suggests that so-called top journals also attract more flawed articles.

Certainly using the impact factor can lead a commission to undervalue our scientific work, but usually we don’t notice that the impact factor also robs us on the other end as well. Because of the impact factor millions are lost that could otherwise be used for funding scientific research.

Because nowadays many journals offer open access publishing for a fee, it is possible to compare what it costs to publish in different journals. Van Noorden gave a good overview of the data and showed that the price for more prestigeous journals is generally higher. The price for some journals is even more than 5000 euros. The price also depends on the type of publisher. If the publisher is a nonprofit they may not charge that much money, even if they have a top journal. But the big publishing companies make about a 40% profit, and usually they push up the prices as high as they can. If we evaluate based on the journal, then there is a higher demand to publish in just those top journals and publishers are able to charge more money.

And research institutions pay that price, regardless of whether they publish in open access or subscription journals. Research libraries pay billions to ensure that scientists have access to the literature. At the same time there are publishers that are able to publish for a significantly smaller price. For example the Scielo system in South America has published more than 700,000 scientific articles, and on average publishing in their journals costs only 150 euros. There are some journals that publish for free, and their operating cost is probably quite small. The mathematics journal Discrete Analysis is a good example. It is possible to publish articles for free and the articles are open access. Cambridge University and the Stanhill Foundation support it monetarily, but the operating cost of the journal is small — about 10–20 euros per article. Why don’t we publish as frequently in these types of journals? Usually the excuse is that their impact factor is too low. If we evaluate based on the journal, then we also have to pay more for publishing — billions more.

That’s why many have emphasized that the impact factor, or other journal-level metrics, should not be used for evaluating articles or scientists. For example in their 2015 report a research group from the Higher Education Funding Council for England came to the conclusion that “Journal-level metrics, such as the JIF, should not be used.” In 2012 DORA (“Declaration on Research Assessment”) was presented, and its main recommendation is that scientific work should be evaluated directly and not based on the journal. Currently more than 2000 organizations and 16000 people have signed the declaration to show their support for that idea.

So if you have seen these problems with journal-level metrics, then you are not alone. I suspect that many of us have realized that evaluations based on journal-level metrics are pointless. And yet, maybe we still play the impact factor game because we think it is important to others. At least, in my conversations with other scientists I have never heard anyone defend the impact factor. However, I have heard the sentence, “Yeah, the impact factor is flawed, but we have to look at it because others evaluate us based on it.” Maybe we all know that the emperor is naked, but we aren’t brave enough to say so.

What can we do? There are many opportunities and many of them are small actions. Choose for yourself. For example, when you are choosing a journal to publish in, don’t look at the impact factor. If we have the opportunity to evaluate our fellow scientists, then we can look directly at the impact of their scientific work and not be swayed by the name of a journal. If some colleague compares scientists based on the impact factor, we could suggest some other metric or method for evaluation. RAND Europe and AAMC have proposed 100 different ways to evaluate scientific work (and not one of them is based on the journal of publication). And we can also voice our opinion by adding our signatures to DORA.

If enough of us show that we don’t evaluate based on the journal, then our research culture will change for the better. So hopefully next time you are evaluated, it will be based on your scientific work and not the journal you have published it in.

Engineer working to build a better future.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store