What science reporters should know about meta-analyses before covering them
As science journalists who take our job seriously, we’ve learned a couple of rules by heart: never present a correlation as a causation, always check whether a sample is representative and never rely on a single study. As the expression goes: one swallow doesn’t make a summer.
These are all good starting points. But they are far from making us unimpeachable in our reporting.
As a result of the third principle, we tend to rely on review studies. More specifically: systematic reviews and meta-analyses. A systematic review is, simply said, a review of the scientific literature on a particular research question, performed in a systematic way to reduce bias. Sometimes a reviewer will just include randomized clinical trials, sometimes other studies as well. A meta-analysis is a statistical method to combine the results of several studies and come to a single result and conclusion. It is often the final piece of a systematic review.
But there is something strange going on here. While we at least try to scrutinize the methods and limitations of all those single studies, we rarely do the same with systematic reviews and meta-analyses. Since they are regarded as the gold standard of empirical science, on top of the ‘pyramid of evidence’, we take the results and conclusions for granted and regard them as objective debate enders.
Unfortunately, anyone who takes a deep dive into the many scientific debates where meta-analyses are involved, or just into how meta-analyses are being conducted, will find out meta-analyses are nothing of the kind.
The issue of ‘metawars’
That’s what I did for an investigative piece I published last year in Science Magazine. My investigation started after I had stumbled upon several examples of what we later called ‘metawars’: scientific battles resulting in which conflicting meta-analyses are published that only stir up the debate. Examples of this are debates about the effect of “positive parenting”, the relation between antidepressants and suicide, or the health benefits of organic produce.
A well known example of a meta-war has become know as the ‘worm wars’, in which experts from different camps continuously battled over the question if mass deworming of children in parasite endemic countries is actually cost effective and worth the effort.
The example that got a central position in my story was the controversy if playing violent video games or watching violent movies would make teenagers more aggressive. In these instances, it’s very attractive to focus on the conflict (makes great stories!) and zoom in on what’s at stake for the respective authors and what their potential conflicts of interest and biases are.
But preferably, science journalists should have the ability to combine this with something else: to critically read the meta-analyses and judge for ourselves. And this ability is, for many of us, completely lacking.
This skill is not only important when conflicting reviews appear. The number of systematic reviews has increased rapidly over the last couple of years. In 1990, only a couple of hundred were published, by 2015 this has increased to over 15.000 in 2015 and has since risen to over 23.000 in 2018 — so we’d better be able to separate the wheat from the chaff.
This is why I organized a session (full audio here) at the World Conference of Science Journalists, that took place earlier this month in Lausanne, Switzerland. The eventual goal of the session was to give science journalists tools to (more easily) assess and interpret these reviews in order to do our jobs as critical reporters.
It all starts with a question
I invited three speakers. First of all, the recently appointed editor in chief of the Cochrane Library Karla Soares-Weiser emphasized the importance of the actual question for the outcome of a systematic review. This is tip 1 for anyone trying to make sense of a systematic review: check what the exact question is the authors are trying to address. Soares-Weiser mentioned reviews of the first rotavirus vaccine Rotashield, where there were concerns about a rare but severe adverse event, intestinal folding, that would not be picked up by randomized clinical trials. The Cochrane reviewers rephrased the question to include larger, observational studies that could pick up that signal. Rotashield has since been taken off the market and the updated Cochrane review recommends continued surveillance of all licensed vaccines for severe adverse events.
After the research question has been formulated, the review authors define the search criteria they will be applying to find and select the studies to be incorporated. This is followed by a quality assessment of the identified studies and extraction of the data from these studies that's relevant to the research question. Next comes the step in which the reviewers determine whether the included studies seem to be very similar or rather different, which could impact the result and its confidence interval. The reviewers decide whether a meta-analysis is possible, if so, conduct it and interpret the resulting outcome.
The quality of systematic reviews varies widely and is in a great majority very poor, said the second speaker, Jos Kleijnen, professor of systematic reviews in healthcare at Maastricht University in the Netherlands and director of Kleijnen Systematic Reviews. A random sample of reviews his company analyzed showed that of those published in 2016, 80% had a high risk of bias and just 13% had a low risk of bias. Even in Cochrane reviews, only 87% had a low risk of bias and 8% had a high risk of bias.
Several factors together determine the quality and utility of a systematic review. First of all, it matters what the reviewers put into their analysis. When a systematic review is based on poor quality studies, its conclusions are very limited.
On the other hand, a systematic review based on very rigorous studies can still be sloppy or biased. As in designing experiments or trials, the investigator needs to make several decisions and judgment calls. They can include or exclude certain study types, limit the time period, include only English-language publications or peer-reviewed papers, and apply strict or loose study quality criteria, for instance. All these steps have a certain degree of subjectivity, Kleijnen told us. ‘Anyone who wants to manipulate has endless possibilities.’
Some checklists to help you
Kleijnen showed several ways to check the quality of a systematic review. In addition to carefully reading the methodology, it’s smart to use a checklist. The two prominent ones are AMSTAR 2 (an updated version of AMSTAR 1) and ROBIS (which stands for Risk of Bias In Systematic reviews).
There is some overlap between the two. AMSTAR gives an estimate of overall quality and only works for reviews of healthcare interventions. ROBIS focuses specifically on the risk of bias introduced by the conduct of the review, and is suitable for any type of systematic review. It would be too technical to go into the details of these checklists, but for instance, ROBIS scores whether a study has clear study eligibility criteria and well-defined and good methods for the identification and selection of studies.
It can be quite time consuming to perform such a ‘critical appraisal’. At Kleijnen’s company KSR Ltd, a team of experts might take days to assess a single review. Luckily, they have collected all the critical appraisals they have performed in a database: KSR Evidence. It’s not open access, but if you send them a request they might be so kind to provide you their report of the review you’re interested in. Another, open access database that contains some critical appraisals is called Epistemonikos.
Here it’s important to remember: just as conducting systematic reviews is a human endeavor, so is critical appraisal — nothing is completely objective and nothing is perfect. And even when a publication checks all boxes for a ‘well conducted’ review, it can be flawed.
This is where the talk of Hilda Bastian, long time patient advocate, self-taught systematic review expert and Cochrane co-founder, tuned in. On her popular PLOS-blog Absolutely Maybe Bastian writes clear and witty, cartoon-enriched pieces about (among other things) meta-analyses.
In her talk, she focused on mistakes journalists make when they cover meta-analyses. The most obvious and widespread ones are overestimation of the effect and the weight of the evidence. When we write about a meta-analysis, we usually portray it as the powerhouse of science: we emphasize how rigorous it was, how many studies and patients were involved and as a result how robust the outcome is.
In some cases, this might be true, but often it isn’t. Many reviews are based on a small number of studies, lean heavily on one or two big trials or on some very weak studies. An important lesson, therefore, is to look at a meta-analysis as critical as we do at a single study: only when it is indeed extremely well conducted and based on very strong evidence, we should present it like that. “If only two of the included twenty studies measured an actual outcome, then the impression should not be given that the evidence comes from twenty studies,” said Bastian.
Bastian gave some insight into the graphical ways results of meta-analyses are presented, with one main take home message: look sharply at the variety between those results and the breadth of the confidence intervals. That will give you a quick impression of the consistency and reliability of the underlying data.
What can a rushing reporter do?
After the talks I asked the speakers a question I found very relevant myself: when a reporter has little time and wants to assess a systematic review, what should he or she be looking at? First, of all, look at how old the review is — not just the year of publication, but when did they stop collecting studies. The second would be: check if they report the quality of the included studies — if not that would be a very bad sign. A ‘knockout criterium’, as Bastian called it.
The speakers agreed that, when it is available (in case of a Cochrane review it always is), zoom in on the ‘summary of findings’. The great thing of this section is that it states the strength of the results, which is higher when a lot of large randomized clinical trials have been included and lower when the included evidence is weaker.
What questions should we ask authors of the study? Kleijnen suggests to explore whether the conclusions the authors draw are congruent with the data and the results presented in the review. ‘It is a rather common flaw that the conclusions are more optimistic than the data would actually justify.’
And what can we do when two rather well conducted reviews on the same topic come up with different conclusions? Kleijnen says he would look at the research questions of both. ‘If on surface, it looks like there are two reviews on the same topic, very often it turns out there are actually two similar reviews on two slightly different topics.’
This seems to be the case in the worm wars, where the team that was involved in the field work, framed their question differently than the other team, that mainly consisted of methodologists and statisticians not involved in the field. The solution for you as a reporter would be to decide which of these questions makes the most sense.
Bastian added that one of the reasons there can be multiple reviews being published shortly after each other, is that one has been published that somebody else didn’t agree with. In that case, we should be looking at things we would be looking at in other cases, Bastian said. ‘Is there some kind of conflict of interest, not just financially but intellectually as well?’
Because of all these ‘manipulation opportunities’, Soares-Weiser emphasizes, the protocol of a systematic review should always be published before the review is conducted. Although she admits that for someone who knows the literature well, it will still be possible to rig the review, even unconsciously, by adjusting the question and criteria to include, exclude or downplay the weight of certain studies. ‘That is why a good declaration of conflicts of interest is so important’, she said.
Speaking of conflicts of interest, one person asked whether it is easy to find out who initiated or funded a systematic review, like ‘Ocean spray funding a review on cranberry juice’. This has improved, said Bastian, although at least you should check whether the authors have discussed in their paper who funded the included studies. ‘That is the work they should have done for you.’
What about publication bias?
Another relevant question that an attendee raised, concerned a phenomenon called ‘publication bias’: how can a systematic review answer a question when many studies addressing it ended up in a file drawer?
Kleijnen remarked that good meta-analyses always try to include unpublished studies, such as thesis chapters and registered but unpublished trials. Of course one never knows if there are other studies missing. There are statistical methods being applied to correct for publication bias, but unsurprisingly none of these is flawless either.
The debate about media violence aggression is a good example of one distorted by publication bias. Actually, at least a large part or even the whole effect seems to have been caused by the fact that most small or null results have not been published.
Approaching the end of the session, one member of the audience (actually, it was the editor of my Science feature, Martin Enserink) stood up and asked: ‘How are we going to tell to our audience that even the meta-analysis, supposedly the gold standard of evidence is not so clear? And won’t this make the public lose trust in science even more?'
‘Well, it’s still better than politics’, Kleijnen remarked. I added that indeed there do seem to be differences in quality, that a good meta-analysis is still better than no meta-analysis and in most cases still better than a single study. ‘But we have to take time to actually look at those papers and their methods, and not just say that it is the gold standard or it is just rubbish. We have to do our jobs I guess.’
Bastian: ‘It really is no different than anything else in science journalism. The one thing that is different is that many people had the feeling they didn’t need to understand the methods and techniques used in systematic reviews and meta-analyses. But if you want to cover them you probably do. And it is not as horrible as it might seem.’
Red flags (for not taking the result of a systematic review for granted)
- Team of authors without statistician
- Team without clinical expert
- Different systematic review published with different outcome
- Financial conflict of interest
- Intellectual conflict of interest
- Protocol wasn’t published beforehand
- Quality of included studies not discussed
- COI in included studies not discussed
- COI of review authors not declared
- Not including all relevant studies, e.g. only English language, or only studies published in peer reviewed journals
Further reading (by Hilda Bastian):
Listen to the audio recording of the crash course @ WCSJ 2019.