What Counts As Evidence in Changing Practice?

(This is one of my longer posts, but it necessarily deals with quite a lot of detail — I’ve tried to shorten it through the inclusion of links to further detail)

A few weeks ago I was embroiled in a heated (though ultimately futile) argument with a teacher from Australia who claimed that I was training teachers in a methodology (Project-Based Learning) for which there was no positive evidence.

Even as I was presenting the evidence, I knew it would make little difference to his view: for all the supposed impartiality of educational researchers, personal bias and the cherry-picking of supportive evidence still marks most of the — increasingly acrimonious — social media ‘debates’. And, yes, I’m as guilty of choosing my evidence as anyone else.

In my defence, I feel I can be excused on one count: I don’t go along with the current obsession that any educator should only care about ‘evidence-based practice’. And the past few weeks have given me a chance to think about what teachers look for when they decide to innovate in the classroom, not least because I’ve been running workshops in the science of improvement (more of which later) in Australia, India and (soon) the UK.

There’s a dismissive view of some education researchers, that teachers work mainly on their guts, introducing only the innovations that press their political, cultural and emotional buttons. They’re accused of failing their students because they don’t keep abreast of the empirical evidence emerging in academic journals. If that isn’t bad enough, even more scorn is reserved for those who use evidence to inform their teaching and learning strategies but use the wrong type of evidence. Twitter is full of these arcane, ‘angels-on-pinheads’ back-and-forth slanging matches. For now, let’s ignore the question of why so much academic research is written for fellow academics, rather than teachers (but please read this really cogent critique on the failure of research to affect practice in education)


The Gold Standard?

Now, I want to clear — there is undoubtedly value in RCTs when evaluating classroom innovation. But there are also deep flaws that many teachers rarely get to hear about. I want to touch on them here, but I don’t wan’t to get into the nit-picking detail, as this post is aimed at time-starved teachers — people who get excited about that stuff can investigate the integrated links.

Concerns about using RCTs in Evidence-Based Education emerged soon after the first RCTs were conducted. The key issues could be summarised as:

  • RCTs exist to identify causality and predictability. But complexity theory suggests that conclusions drawn from RCTs ignore context and, as we know, context is everything. A paper written 15 years ago supports the view of W.E. Deming, that you can’t understand why something worked unless you look at the system in which it worked: the people, the structures, the motivations behind the implementation…I could go on. Well-respected academics have even suggested that the same teacher, teaching the same class, will see markedly different outcomes from one day to the next. What difference does chance play depending upon the day that data is gathered?
  • RCTs attempt to eliminate variables during the implementation period, but the given innovation is then scaled-up in a wide variety of conditions, where results are invariably less impressive. This explains why we’ve seen a succession of ‘silver bullets’ hailed as transformative, only to be re-classified as duds in the intervening years. The elimination of variables doesn’t just plague education: healthcare research is similarly torn between attempts to eliminate all variables, and a desire to put experiments into ‘real world’ conditions;
  • RCTs are also criticised for understating the value of other data sources, and for failing to investigate what Yong Zhao has labelled ‘side effects’ — unintended consequences and impacts further down the chain;
  • There are serious ethical considerations behind RCTs, particularly in promising cancer treatments (I speak from personal experience) — many people feel that it’s not only ethically wrong to deny stage four cancer patients access to promising new drugs (by giving them placebos) but that they will ‘contaminate’ the evidence by finding their own alternative treatments. We don’t have ‘placebo’ effects in education, but we do have control groups, against whom intervention effects can be compared. But think about the amount of variables involved in this group: the learning kids do at home; the impact upon learning that their non-participating teachers have; the oft-reported higher levels of time and enthusiasm allocated to the novel, compared to the control experience . (Anecdote: a while ago I’d delivered two days of Project-Based Learning training at a school who then asked how they could further extend their professional implementation of PBL. I suggested they join the trial group of a national RCT into the impact upon literacy/numeracy that projects can make. They then told me they were already in the control group! “But you’re contaminating the evidence” I said. The senior leader said: “We’ve been told we can’t touch PBL as a methodology for the next three years — our kids can’t wait that long!”) I would argue that it’s not only futile to eliminate all variables in RCTs, it’s ethically questionable;
  • RCTs may be seen as the gold standard, but you only have to look at the literature surrounding RCTs to see that there are ‘good’ and ‘bad’ RCTs (in design, execution and analysis). Equally, any given classroom innovation can be implemented well or appallingly. I have seen excellent Project-Based Learning practice, for example, and really, really, awful implementation — the impacts upon learning would be poles apart. RCTs attempt to lay strict guidelines on the implementation of the innovation being evaluated, but so long as humans are delivering these innovations, there’ll always be significant differences.

(Please note: this is intentionally a broad overview of problems with RCTs. For a forensic examination in flaws on the US What Works Clearing House evaluation in a single subject (Maths), I’d urge you to read Smith and Ginsburg’s excellent paper, Do Randomized Controlled Trials Meet the “Gold Standard”?, where they cite a whole host of specific flaws with studies that were influencing policy making. Also see Nick Hassey’s piece on RCT limitations)

In summary, RCTs will inevitably confirm that anything can work for some schools, under some conditions, and in some contexts. But they also prove that no single innovation can work for everyone, everywhere. Yet this is precisely how they’re used by policy-forming politicians.

Ah, but what about meta-analyses? Surely, when you gather enough data, the bumps in the road listed previously can be smoothed out, and absolute conclusions drawn? This, after all, is how John Hattie’s ‘Visible learning’ came to be seen as ‘the holy grail’ for teachers.

He’s not the messiah…


Would that it were so…. Since the publication of ‘Visible Learning’, there have been a string of criticisms surrounding the methodology used (averaging effect sizes is misleading), the maths deployed (Hattie initially indicated probability as a negative number, which — I’m assured — is mathematically impossible), the idea of ranking interventions, and, the most problematic of all, the categorisation of interventions. I understand why, in an attempt to make sense of 1200 meta-analyses, you have to have some general classifications, but by ranking them for effectiveness, a license to cherry-pick was unleashed, and the inevitable dumbing-down of complex data spawned a whole host of dumb statements by education ministers globally. What, for example, constitutes a ‘Creativity progamme’? And is it significantly more impactful than homework? The label ‘direct instruction’ covers a multitude of sins, like ‘problem-based learning’, but, hey, it’s more effective, so what we see around the world are creative problem-seeking teaching strategies being abandoned for discredited ‘explicit instruction’ pedagogies that Hattie never had in mind to begin with. And we’re now seeing a whole industry springing up to support teachers in making learning ‘visible’.

It seems that, although it was done for the right motives, the end-result of the Visible Learning phenomenon has been to further de-professionalise teachers, and impoverish the debate around educational reform, not enhance it.

(And you can’t read Todd Rose’s excellent “The End of Average’ without seeing any processes resulting from averaging in a new light — more in a future review.)

Does it count?


Ultimately defining what works will always be a matter of teacher judgement, not simply data, and the disproportionate leverage of RCTs and meta-analyses causes teachers to question their own judgement, not complement it.

So what else counts when considering whether to introduce a new innovation? Here are (at least) seven factors:

  1. See for yourself — I write books, pamphlets and articles. I give talks, I run training events. But the single most persuasive factor in convincing teachers is for them to see it in action. They compare contexts and organising systems. Most importantly, they see the effect upon students, and determine how their own students might respond. There’s a reason why the High Tech High schools in San Diego (global exemplars of project-based learning and much more) receive over 5000 visitors a year;
  2. Speak to peers — social media makes this a far easier proposition than it used to be. When I was first advocating the pedagogies behind the Musical Futures programme, teachers were openly sceptical. And rightly so. But once they’d talked to a teacher who’d been there and done it, they were more willing to try it out;
  3. Have a problem to which the given innovation might represent a potential solution — there are simply too many silver bullets out there, so a disciplined innovator is much more likely to look at those that might meet their ‘moment of need’;
  4. Undertake training — if our training is indicative, this is often the final step before introducing an innovation. And it’s surprising how often this stage is skipped, contributing to less than impressive results, compared to the original pilot;
  5. Examine student outcomes — and not necessarily test scores, though of course these matter. When we had sufficient data from the Musical Futures evaluation to show a 30%+ increase in students electing to take music as a result of the innovation, the scale-up really began to snowball;
  6. Determine the match with the school’s culture, design and philosophy — an innovation can only attain a sustainable improvement if there’s coherence with the school development plan, and the support structures. Put simply, without senior leadership support, any innovation is dead in the water. As Peter Drucker famously observed ‘culture eats strategy for breakfast’, and nowhere more so than the school staffroom;
  7. Parental response — woe betide the senior leader who attempts to introduce iPads, or BYOD, or circadian rhythm-determined start times, without testing the waters with parents first.

So, these are just some of the forms of evidence that are important to educators. To repeat, there’s clearly an important place for RCT data and meta-analyses, but they’re no more important than teachers exercising their professional judgement through more practice based sources.

I believe that the only sustainable future for professional learning and innovation in schools is one which is driven by teachers, not externally imposed. One that sees innovation as constant, not coercive, or ad-hoc. This is why I believe so much in transferring the science of improvement from the healthcare, aviation and automotive sectors into education — and I’ll examine this in more detail in my next post.

Teachers need to work with academics to design their own innovations and then determine what evidence works for them, not for others. What if, instead of urging evidence-based practice, we called for the reverse: what Tony Bryk called ‘practice-based evidence’ — how might the pace, and ownership, of innovation, be transformed?

Originally published at Engaged Learning.