Assessment practice that is wide of the mark

Pun intended

In the drive to improve standards, assessment often comes under scrutiny. This can lead to some really positive changes, and on twitter I often see new marking/feedback/assessment policies that are real improvements. However I also see practices that are made with the best of intentions but miss the point. I don’t profess to have all the answers, or that my day to day assessment is perfect, but hopefully critiques like this will help me improve.

Doing summative assessment and claiming it’s formative.

For the purpose of this point I mean ‘summative assessment’ to be a form of assessment that is designed to be used for a summative purpose and having students complete the assessment in those same intended conditions. A classic example might be for students to sit an end-of-topic test on Electromagnetism made up of past paper questions. They may sit it in silence in a lesson at the end of a topic.

Teachers who then try to use this formatively might not give any marks, or get students to work out their ‘strongest’ and ‘weakest’ topics based on the number of marks gained out of those available.

Figure 1:A typical exam question

Let’s say a student completes the test and achieves 3/6 on the last question. I have seen QLA spreadsheets that will mark this as Amber or Red and then a student might set a target of ‘Revise the force on a current carrying wire’.

The issue is that a student’s score on this electromagnetism question will be affected by their knowledge and understanding of;

  • The force on a current carrying wire
  • Conversion of units
  • Calcuation of moments
  • The principle of moments
  • The unit of magnetic flux density

So if a student achieves 3 marks, without seeing the script we would have no idea why. This question has been designed to be included within a set of exams designed to discriminate students based on their ability in Physics. Not to identify strengths and weaknesses.

Secondly, a question can be thought of containing three dimensions. The science, the question type and difficulty. I may construct a test where the Moments question is an easy calculation at the start of the paper or it may be a more difficult conceptual explanation question at the end. Achieving half marks on the first option is not the same as achieving half marks on the second.

Thirdly, a test will not include everything from a topic. AQA, for example, do not test a whole course every year, they only guarantee to cover the whole course over five years. So if a student gets full marks on a magnets questions, it doesn’t mean it is their ‘best’ topic for the reasons above, but also because they may not have been asked about Newton’s 2nd Law which they may be ‘better’ at.

Lastly, let’s say we were to test the whole course with every permutation of question to cover the science/type/difficulty dimensions. The reliability of any judgement of ability of a subtopic based on a few questions would be low. As all the subtopics are so closely correlated, the total score is a better indicator of performance on the subtopics than the subtopic score.

If you want to identify strengths and weaknesses, and adapt your teaching as a result, you would have been better off asking asking questions that assessed each skill separately. Some excuse the types of analysis above by arguing that they want to get as much formative use out of end-of-topic test papers as they can, but the clue is in the name: ‘end-of-topic’. If you’ve finished the teaching and you want to assess what they know, then that’s absolutely fine. I would mark the test, have students fix any errors and move on. There’s nothing wrong with summative assessment!

It’s also worth noting that I think past papers questions can be used formatively in teaching, just not as a silent test. Also QLA can be useful if you have questions well designed for the that purpose.