Is it worth publishing a contextualised Progress 8 measure?

A contextualised P8 measure would be fairer to schools but it would not solve all the ills of hyper-accountability, writes Dave Thomson.

Published in

.datalab

5 min readApr 25, 2019

Both Schools Week and TES have reported this week that the Co-op Academies Trust plans to publish a contextualized value added measure for its schools to counter the unfairness of the Progress 8 measure for schools serving disadvantaged cohorts.

We’ve long said that a contextualized Progress 8 measure should be published alongside Progress 8. We’ve included a version in FFT Aspire for a number of years, and researchers from Bristol University have created their own.

But this will not be a panacea that will solve all the ills of hyper-accountability. Below are some reasons why.

1. It all depends on whether Attainment 8 is a suitable indicator of a broad secondary education

P8, and therefore a contextualized P8 measure, are based on the Attainment 8 indicator, the sum total of pupils’ grades in English, maths, three of the EBacc subjects and three other subjects.

The idea is that this represents a broad level of attainment at the end of compulsory schooling.

Performance tables and accountability are supposed to work by encouraging schools to improve teaching and learning. So if teaching and learning improves at a particular school more than at the average school, then (ceteris paribus) its Attainment 8 scores should go up.

If this happens, great — the incentives are working as intended.

But Attainment 8 can be influenced by other factors beyond the quality of teaching and learning.

We know that the introduction of P8 led some schools to change the qualifications they offered to their pupils, with significant rises in some of the EBacc subjects (science and humanities), with some warning about the resulting impact on pupil motivation and behaviour.

We also know from previous work that there are issues with the scoring of some qualifications, with some being graded more (or less) leniently than others. There are technical solutions to this but they come at the cost of reduced transparency.

The problems of relying on a single indicator, contextualised or not, can be overcome by calculating a range of measures. Several contextualised measures are included in FFT Aspire.

2. Even when adjusting for context, it’s still not a measure of school effectiveness

P8 is not a measure of school effectiveness. It is basically a school’s average Attainment 8 score adjusted for prior attainment at KS2. It would only be a measure of school effectiveness if the only things that affected students’ KS4 grades were prior attainment and school effectiveness.

But we know other factors affect attainment: ethnicity, disadvantage, effects of peers and so forth. These can also be adjusted for in CVA indicators.

Yet this wouldn’t give us a measure of school effectiveness either.

The resulting CVA scores would be a cocktail of additional factors that had not been taken into account, for which we would often have no data. This would include things like parental support, tutoring, and measurement error (in both KS2 and KS4 results), as well as the impact of school effectiveness.

For that reason, any value added measure is always likely to be “flawed” as a measure of school effectiveness.

3. Differences between schools aren’t that big

As I wrote in this blogpost, differences in P8 scores aren’t that different for the vast majority of schools. This would be even more the case with a contextualized P8 measure.

Any school rankings would be uncertain and the scores may contain more noise than signal. This isn’t necessarily a problem. If schools don’t differ that much then we don’t need particularly precise measures of their performance.

Perhaps the best we can hope for is to identify a group of schools with unusually high/low scores that might warrant further investigation. Either way, contextualized P8 scores would not offer a firm basis for making definitive judgments about school performance.

4. There are perverse incentives to off-roll

All published measures in secondary school performance tables are based on pupils who reach the end of Year 11.

This introduces a perverse incentive to lose pupils who might achieve very low P8, or contextualised P8, scores.

Again, there are technical solutions to this. We’ve previously suggested reweighting based on all pupils ever on roll.

5. Some pupils can have a disproportionate effect on a school’s P8 score

We wrote here about the small number of pupils who achieve very low P8 scores as a result of illness, refusal or some other adverse event. These pupils then have a disproportionate effect on a school’s P8 score. Hence there is a perverse incentive to move them on before they do so.

In 2018, the Department for Education introduced a method of reducing this effect but it made very little difference.

A contextualised P8 measure would still have this problem. It could be alleviated by providing better information on the spread of scores within a school.

6. There will still be losers

There would still be schools at the bottom of the rankings. These might well include some who have decent P8 scores. Some of these might consider the contextualised measure flawed and biased and campaign against it.

7. Poor inferences may still be made

Finally, and most importantly of all, there is nothing to guard about poor inferences being made from data which in turn lead to poor decisions being made.

If some are losing their jobs as a result of P8 being misinterpreted as a measure of school effectiveness, then there is nothing to prevent the same thing happening with contextualised P8 scores.

In summary

None of these are reasons not to produce a contextualized P8 measure. It is only fair to make best use of the available data and adjust for demographic differences.

For some schools, the two sets of scores will be quite different; for many they will be very similar. Both scores would be ‘wrong’ but both would be useful, to adapt a phrase from the famous statistician George Box.

A contextualized P8 measure would not be a true measure of school effectiveness. And there would be nothing to prevent it being misinterpreted, just like P8 gets misinterpreted. As always, the data should be the starting point, and not the conclusion.

Dave Thomson is chief statistician at FFT Education Datalab.

Follow FFT Education Datalab on Twitter, or subscribe to our mailing list to receive all of our research as it comes out.