Direct observation of causal links: evaluators do it every day

Summary: Cause-effect relationships are as basic to our ordinary perception as are perceptions of colour and shape. All different kinds of perception are subject to biases and illusions. This matters to evaluators because establishing the best possible causal theory of how a project works is the central task in evaluating it. Many authors have tried to claim that making causal judgements on the basis of mere observation is somehow terribly difficult. But newer approaches treat causal judgements as no more or less problematic than other kinds of judgement. Our causal judgements, whether as evaluators or in everyday life, are not perfect. They are subject to persistent biases, illusions and just plain mistakes. Evaluators are familiar with these biases and illusions and there are a whole host of ways of countering them.

If you can indeed reduce perception to a limited set of basic kinds of perception, then causation is one of them.

Illusions in perception of causation are not so very different from any other kinds of illusion like the Mueller-Lyer (Kahnemann, 2011, p. 29) optical illusion.

Direct judgements of causal relationships depend on a context just like any other direct judgement, e.g. of the ambient air temperature.

You might say: the temperature on the thermometer — that just is the ambient air temperature. Ambient air temperature might even be defined like that.

But of course it is only within a context. You have to check that, for example, someone isn’t directing a hairdryer at the thermometer and that you aren’t, say, in outer space. Take an example of direct observation of a causal relationship. (They are all around us.)

The same variable can be different things at the same time. Imagine a high water mark on the bank of a river, and imagine looking at the level of the water. You can judge the level at the same time as the height of the river right now but also, looking at the gap between the level and the high water mark, as the amount the river level is still below the highest water level ever.

In a similar way, imagine the police is introducing a neighbourhood watch campaign to reduce the number of break-ins in a neighbourhood[1], and they have put up big display in a public place, which marks the “high water mark” of the level of break-ins per week in the year before the campaign started and now also displays a lower mark to show the current weekly number of break-ins. Someone points to the gap between the two lines and says: wow, look at the difference the campaign has made already! This is a direct causal perception, and it is fine just as it is even though we know that it is a flawed approximation — it is flawed for example because the baseline had actually been increasing in previous years to the counterfactual, the level break-ins might have reached without the campaign, might even have been higher, which would suggest the gap between the two lines is actually underestimating the effect of the campaign. A better presentation might perhaps use a graph showing the baseline tendency rather than just a line. But every judgement we ever make can always be criticised from some perspective or other.

Why does this matter to evaluators? It matters because establishing the best possible causal theory of how a project works is the central task in evaluating it. Although project evaluation has other tasks too, it is this task which defines it, which makes it evaluation. So we are very interested in how to establish causal links. Many authors have tried to claim that making causal judgements on the basis of mere observation is somehow terribly difficult, and the mainstream view in Western philosophy has long agreed with David Hume that it is perhaps even impossible. There has been for decades an argument within evaluation whether randomised controlled trials and related approaches might perhaps, or perhaps not, be the only way or categorically the best way to provide the causal information we need.

Kahneman (2011) tells the story of how the Belgian psychologist Michotte countered that we perceive causation as directly as we perceive colour. We humans in fact make causal judgements on the basis of observational data all the time. Our causal judgements are never perfectly accurate but they are good enough, and often no worse than any of our other judgements. What is more, artificial intelligence procedures are now quite good at doing it unaided. Judea Pearl has led the way in both generating the actual AI tools and pointing out the philosophical importance of the success of these tools.

Yes, our causal judgements, whether as evaluators or in everyday life, are not perfect. They are subject to persistent biases, illusions and just plain mistakes. Evaluators are familiar with these biases and illusions and there are a whole host of ways of countering them. To take a really obvious example: I can visit a performance at the end of a series of theatre workshops for refugee children and judge their excitement and energy to be a good sign that the workshops have been effective. But at the back of my mind I will have a lot of caveats which need checking out — perhaps the implementing NGO is really hoping for repeat funding and so has persuaded the children to be especially enthusiastic and promised them some reward. If I have never seen these or similar children in other circumstances, perhaps they are just always so vivacious, workshops or no workshops. Perhaps the kids are mainly so excited because foreigners are visiting and perhaps filming their performance.

If I am any good as an evaluator I will think up some ways of checking out and hopefully eliminating these possibilities. The point here is only that my first judgement (“the workshop has really livened up these kids”) was already genuinely causal, though imperfect. I can present a video of them jumping about as part of the evidence for the effectiveness of the workshops — and I will hopefully be able to back this up with additional evidence I gathered to eliminate some key illusions and biases. Just as a randomised controlled trial says: here is the effect, and here is some additional evidence we gathered to eliminate some key biases.

An experienced evaluator might even stop seeing the enthusiastic kids as evidence of good workshops. But as evaluators I think we need to cultivate, not try to supress, our instinctive, gut responses and perceptions. Like a good psychotherapist, we need to resonate even more, not less, keenly with all the signals around us, but at the same time be able to keep our distance and be aware of possible biases, deceptions and illusions.

[1] A classic example from (Pawson & Tilley, 1997)

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.