Humanitarian practitioners shouldn’t aim to copy evidence-based medicine

Most people who work on improving the use of evidence in any given field want to be more like the medical field, whether they want to admit it or not. Overall, evidence-based medicine is a huge and laudable success, and is one of the longest-running experiments in evidence-based practice.

Because it’s such a clear example of success, it makes sense to try and take as many insights from evidence-based medicine as possible and adapt them to humanitarian contexts. For example, a common example is to focus on a set of goals for improving the use of evidence, such as the following:

  1. Improve the rigor of information used for decision-making
  2. Make this information as accessible as possible
  3. Leave some room for context and practitioner knowledge
Spoiler alert: It’s very hard

Unfortunately, there’s a few features of humanitarian work that it difficult, and even misleading, to copy the lessons of evidence-based medicine. This primarily has to do with the specific setting in which evidence-based medicine was developed: one where evidence is prolific and context is perceived as relatively unimportant. This allowed evidence-based medicine to rely on a handful of simplifications that cannot be as easily made in humanitarian contexts.

We lack evidence in humanitarian contexts

On it’s own, this is a solvable issue — but it’s worth mentioning anyway since it has larger implications. In evidence-based medicine, practitioners facing an issue can more often than not find relevant studies — the Cochrane Library alone has records of over 30,000 high quality reviews, and over 700,000 clinical trials. Compare this to stable development contexts: where the most comprehensive database of high quality impact evaluations contains barely over 4,500 records, and the estimated number of high quality reviews is in the hundreds. Focus on interventions in humanitarian contexts, and the number gets much, much smaller — right now, there are only about 120 completed high quality impact evaluations focused on conflict-affected or refugee settings, according to the most comprehensive search for this type of research (3ie’s IER database). The tools of medical research might not apply readily to a smaller, and in many ways different, evidence base.

As an aside (a.k.a. a shameless plug), it’s worth noting that, the International Rescue Committee is one of the largest single contributors to the impact evidence base for conflict and refugee settings — out of the 120 previously mentioned studies, the International Rescue Committee has contributed to or directly conducted 20 — and we’re already in the process of running 17 more as of August 2017.

Definitions of rigor from medical research were built around a large and consistent body of research

Because of the large body of evidence, often of variable quality, many of the tools evidence-based medicine focuses on ‘cutting through the noise’ of the evidence base, primarily by helping practitioners determine and improve the quality of research. In particular, the development of randomized control trials (RCTs) and systematic review methodologies have been lauded as ‘gold standards’ for separating ‘fact’ from ‘opinion’.

It is very clear that humanitarian assistance is also in need of methods and standards to help us differentiate ‘fact’ from ‘opinion’ when it comes to the question of what the best approach to a problem is. As such, over the past few decades the development and humanitarian fields have both attempted to apply ideas from the medical field — leading to a (relatively controversial) proliferation of impact evaluations and systematic reviews of interventions in recent years.

As the evidence base for humanitarian and development assistance grew, an underlying problem became apparent. Methodologies directly adapted from the medical field (e.g. RCTs) evolved to estimate simple causal chains as effectively as possible. Causal chains in development and humanitarian work, however, are often complex and fairly idiosyncratic — treating these complex causal chains as if they are simple and linear may cause us to miss important insights. In particular, the importance of context in determining the effectiveness of programs means that the tools of the ‘short causal chain approach’ fall short.

Context needs to play a bigger role in research

Unsurprisingly, context matters a lot for the effectiveness of interventions — not only do mediating factors change the effectiveness of interventions, but certain situations can completely break the causal chains that interventions rely on. Methodologies borrowed from medicine are not, by default, well suited for dealing with contextual differences.

An underlying assumption of much of medical research is that, in the ether somewhere, there is a ‘true’ average effect size for a given intervention — and it’s the job of research to try and estimate that true effect size. This is obviously a simplifying assumption with some notable issues even in medical research, but it very clearly breaks down in development and humanitarian settings. These techniques can, with enough resources, be adapted to mitigate this problem (e.g. ways of including moderators in meta-analysis, sub-group analysis in impact evaluation, etc.). With that in mind, we need to ask ourselves: are we adapting methodologies in a way that adequately addresses problems of context — and are there most efficient ways to approach this problem?

“We need to be evaluating ideas, not programs”

This call to action from Chris Blattman perfectly encapsulates one approach to moving beyond the confining ideas of medical research. We need to learn to develop and adapt methodologies that better answer the questions that are important to practitioners: things like “what works for who, when, and why”, rather than “what is the average impact across contexts”.

Unfortunately, the tools that have been built to assess rigor (including attempts to conflate ‘quality of research’ with ‘experimental or quasi-experimental design’) have been adapted primarily from the medical field, and as such are built around ability to estimate short causal chains. We lack tools that help us quickly assess the quality of research that attempts to ‘evaluate ideas’. At an even more fundamental level, we lack simple tools that help us understand how research applies across contexts.

No simple tools exist to help individual practitioners check if research applies to their context

Let’s say that you have some low-rigor research from the right context, and a bunch of high-rigor research from the wrong context. How do you work in that situation? This is where we have to rely on ‘critical appraisal’, which is the art of determining how much you should trust a given research finding to apply to your context. There are many tools to help people do this, but most of them are absurdly complex — as they should be — because applying research across contexts is absurdly complex.

At this point, a lot of budget-constrained people working on this problem will try and take some shortcuts, like making a universal ‘evidence quality scale’ based on the methodology of the research or by having a central body critically appraise a bulk of research and assign a ‘quality score’. However, efforts to centralize critical appraisal in development and humanitarian contexts don’t often turn out to be as helpful as assumed.

Existing ‘quality scores’ don’t answer practitioners’ most important questions

Even if we leave aside the previously-mentioned problem that no simple tools exist for complex questions, we also need to face the possibility that even for simple questions, like ‘what is the average impact of X on Y’, the tools adapted from evidence-based medicine may be misleading. For example, many groups in development and humanitarian practice have focused on making evidence repositories and tools that critically assess the quality of, effectiveness of, and sometimes cost effectiveness of, different programs. These differ, often, from the “knowledge portals” of the past, in that they attempt to directly help the user understand how much weight they should give to the findings themselves.

In particular, many of the above use some method for assigning a ‘quality score’ to a finding, often based on scales developed from medical research. Assigning a quality score can simplify research findings, but it can sometimes simplifies them in a meaningless way for practitioners, because what would count as ‘good evidence’ for one context can be completely misleading in another. Hence why the IRC is fairly reticent to do things like assign stoplight colors to research results.

How do we move forward under these constraints?

There’s no simple answers, unfortunately. At the IRC, we’ve increasingly been focusing on finding new frameworks for thinking about evidence use and have actively tried to identify and avoid common mistakes. Most importantly, though, we’ve found that working directly with practitioners to co-create evidence syntheses and evidence products, rather than attempting to spread evidence from an ivory tower, has been the most important part of successful evidence use.

I mention this because approaches developed from evidence-based medicine lend themselves to ‘ivory tower’ thinking, in which ‘experts’ develop evidence and try to get as many people as possible to use it — but at the end of the day, this approach is doomed to fail. Only by embedding evidence synthesizers directly in decision-making and implementation processes, and blurring the lines between ‘producers’ and ‘users’ of evidence, can you improve the meaningful use of evidence in humanitarian practice, and for that to happen, we need to abandon some of our preconceived (i.e. borrowed from medicine) notions about what ‘evidence-based practice’ looks like.