Understanding the current state of retractions & what you can do about it as an author to avoid citing them in papers

What a massive database of retracted papers reveals about science publishing’s ‘death penalty’

Back in 2018, a Science magazine article using the Retraction Watch Database, found that Singapore had the third highest retractions rate in the world!

This made me sit up and take notice and I started reading some of the abundance literature around retractions and it seems that alarmingly a fairly large number of authors have been citing papers that have been long retracted and are presumably unaware of of doing it.

For example, this paper suggests “retraction of articles has no impact on citations in the long term, since the retracted articles continue to be cited, thus circumventing their retraction.

Are Discovery vendors at fault?

Some have speculated that part of the reason is until recently , retractions were either not always indicated in Discovery Platforms like Scopus, Web of Science etc or were inconsistently displayed/marked , making it easier to miss them.

Scopus in particular, did very poorly in a 2018 study — Retracted Publications in Mental Health Literature: Discovery across Bibliographic Platforms (2018).

The study studied 145 retracted papers and found 140 of them indexed in Scopus. Of this 140

only an astonishing low 6 (4.5%) of Scopus items were correctly marked as retracted! Web of Science did better though not perfect with 71% marked so.

Retracted Publications in Mental Health Literature: Discovery across Bibliographic Platforms

To be fair the sample of retracted papers was drawn from Retraction Watch database, which at the time was something neither Scopus nor Web of Science had access to for checking (chances are they used the less complete Crossref data, more on this later).

Still I guess Discovery vendors can claim that even if they do not properly mark retracted or consistently mark papers, when researchers click in and land on the journal sites, the journals should display the retraction notices clearly, so it is not fully on them either. (But do we suspect some authors cite without reading….? but that’s a even worse problem)

That said they have generally improved on accuracy and displays standards since then e.g. Since 2019, Web of Science now has a “Retracted Publication” type on top of editing the title to include the words RETRACTED article)

Web of Science changes the title to include the words RETRACTED to draw your attention + Pub type for filtering

How often are papers citing retracted works

And yet various studies such as this one , show that this might not be enough.

In The Schön case: Analyzing in-text citations to papers before and after retraction, we can see in raw numbers there are still quite a lot of citations to the papers after it is retracted.

The Schön case: Analyzing in-text citations to papers before and after retraction

Of course, someone might cite a retracted papers knowingly and note it is retracted,so the study also looked at the in-text citations and classified the cites manually as “Netural”, “Positive” , “Negative” or “Retracted” if it noted the paper was retracted.

The Schön case: Analyzing in-text citations to papers before and after retraction

It is quite alarming that of the cites to the retracted papers, post retraction period, the most common type of cite (in all sections) was a “Neutral cite” (130/90.9%) and only (6/4.2%) specifically noted the paper was retracted. (Note a sensitivity test by changing the post retraction period to 2013 or 2015 found little difference in results).

At the extreme, this is even true for cites to I suspect one of Science’s most infamous pieces of retracted research “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children” by Wakefield et al (1998) in The Lancet.

Not familiar with it? It is the paper that made the claim of linking autism with the MMR vaccine.

For context this paper first published in 1998 , partly retracted in in 2004 (“the retraction of an interpretation”)and then a full notice of retraction was made by the journal in 2010.

Yet this paper that studied citations to the Wakefield(1998) up to 2019, found that between March 2004 (date of partial retraction) to 2019 (end date of the study), the paper had 881 citations of which only

493 of 881 (56.0%) of the citing works documented either retraction.

The results look better if you account for publication delays (papers already in the pipeline of being published when the partial retraction occurred) and count only papers from 2015, but even between 2011 to 2019, only 360 of 502 (71.7%) mentioned the retractions or alternatively 28.3% did not mention the retraction.

That said to be fair, most papers that cited Wakefeld(1998) were Negative citations though you still see some “perfunctory” and even “affirmation” cites after 2010.

Figure 1. Characteristics of References to the Article by Wakefield et al by Year of Publication

There are many more studies on retractions but it is fair to say that quite a lot of authors are unknowingly citing retracted paper. Why?

Why finding out what is retracted isn’t (wasn’t?) easy

It was also in 2018, I heard about the Retraction Watch Database

Retraction Watch Database
JROST Flashtalk: Retraction Watch Database (2018)

While I had always been a fan of Retraction Watch Blog, at the time I vaguely knew that Crossref’s Crossmark program existed.

Short introduction to Crossref’s Crossmark service.

It’s basically a button/badge that publishers display on their site and

“When a reader clicks on the button, a pop-up box appears that shows the current status of the content (up-to-date, updates available, or retracted), a persistent link to the publisher-maintained copy, and any additional information.” — Crossref Crossmark

Crosssmark button on retracted paper
Clicking on the Crossmark button

Systems like Scopus, Web of Science could also query the Crossref API for free to get information on articles — which includes information on retractions etc.

So with Crossref Crossmark service, why does Retraction Watch Database even exist?

Leaving aside the issue that not all journal publishers were minting DOIs via Crossref, not all Crossref members are even Crossmark subscribers, since this is a separate optional paid for service at the time (it only became free in April 2020).

Also of course, even if a publisher is a member of Crossmark it did not mean they would definitely properly mark retractions in Crossref. There may be various reasons why there is no incentive to do so. When I asked in 2018, Retraction Watch claimed they have a lot of retractions that are missing in Crossref.

The other nice thing about Retraction Watch Database is they have a nice taxonomy of reasons for retractions. Not all retractions are the same.

Taxonomy of reasons for retraction in Retraction Watch Database

Things are improving

It seems to me around 2018, the problems of retractions began to be taken more seriously (perhaps thanks to the Science piece), as more and more people started to talk about the issues around it.

In particular , earlier this year in April 2020, Crossref announced they would stop charging Crossref members for using Crossmark!

So now that direct financial costs of using the service is no longer a factor, the ball is in the publishers court to properly use this service (which would still cost money in terms of labour)

But as Jodi Schneider suggests publishers are already doing a lot of manual labour editing and validating for citation style, why not check the submitted manuscript for citation of retracted work? Perhaps even using a automated or semi-automated system?

And I could be wrong but today (2020) many manuscript submission systems are indeed starting to do auto-checks for various things such as conflicts of interest, finding suitable reviewers and perhaps checking for cites of retracted work. For example scite intergrates with Manuscript Manager that provide context to references in manuscripts including retractions. More manuscript systems such as ScholarOne (very popular) will start to do such checking as well.

But how do these systems work? Obviously you will need a database of retractions to check against. What sources do we have available?

Sources for checking retractions

Somewhat less relevant to this article is of course the ability of the system to either accept references in structured format or to parse and extract citations in manuscripts submitted.

Crossref is the obvious and free to check but we already talked about it’s weakness.

Another source is PubMed , though I came to know about Open Retractions that combine the two sources.

The other obvious source that was already mentioned is Retraction Watch Database, and it seems

“the entire Retraction Watch Database (RWDB) available as a CSV file to scholars, journalists and others who plan to publish their findings, publishing the entire dataset is prohibited, as is scraping the site”

though commercial entities will have to pay to access it.

Another possible source is via Wikidata which add the “Is retracted by” property in 2018.

But which systems currently use these sources to check for retractions?

Flagging Retractions — how to improve

One of the earlier mentioned papers, made the following policy recommendations.

Firstly it notes that bibliographic databases have different policies on indexing retractions and not methods are equal.

For example, while Web of Science alters the titles of papers to include the words “retracted” into the title, PubMed does not. As such when researchers import these papers into their reference manager from Pubmed, they may miss that the paper is retracted (even though the metadata indicates that it is).

Secondly, Publishers have different policies with regards to how they show articles are retracted. They recommend

that publishers add a label, such as “retracted,” to the title of articles that have been issued notices of retraction.

Thirdly, they recommend that citation styles provide a standardised way to cite retractions. For example APA 6th (which was available at the time of the paper in 2019) did not have a standard way of citing retractions, something that was fixed in APA 7th Edition.

Lastly, it stresses the importance of a bibliometric manager that can auto-check citations for retractions, this I believe is in fact the most important recommendation at least for researchers to do . Fortunately this is indeed happening….


As journals are increasingly screening citations for retractions, it might be a good idea as a researcher to do your own screening before you submit your manuscript to avoid embarrassment. It is one thing to cite on purpose a retracted work, another to be unaware of what you are doing.

And there are indeed free systems you can use to flag these citations which I will cover in depth in the next blog post. But if you are curious these are some of the systems that you can consider

Pre-submission manuscipt checks + Reference managers

  1. scite’s reference checker (uses Pubmed, Crossref, Retraction watch database)
  2. Scholarcy preprint healthcheck API
  3. Zotero built-in Retraction items notification check (uses Retraction Watch database)
  4. Zotero + scite plugin (for “smart citations”)
  5. Zotero + PubPeer (for flagging papers with a lot of comments which can be a interesting signal)

Edit April 2021 : More systems are starting to support retraction checking

  1. Manuscript systems like ScholarOne, Karger Publishing
  2. Browser extensions — Lean library partners with scite, Third Iron Libkey with Retraction Watch database




Thoughts on open access, focusing on discovery, delivery from the academic librarian’s point of view.

Aaron Tay

Aaron Tay

A Librarian from Singapore Management University. Into social media, bibliometrics, library technology and above all libraries.

