Pragmatic Assessment

Last week I gave a talk at #cogscisci ‘Meeting of Minds’ looking at how I use assessment and my developing view of reporting. This post is a summary of my talk, minus the tangents, rambles and the extraneous detail.

Validity, Reliability and Dependability

Two major qualities of assessments are validity and reliability. Validity is a measure of how well an assessment reflects what you want to assess or how well an assessment supports the inferences we make from the results. I might validly say that a student who does well on a physics test is good at physics, but it would probably be invalid for me to infer they would also be good at dancing. Reliability is a measure of how consistent the results of an assessment are. If a student were to take the same test twice, would they achieve the same result? If two markers were to mark the same piece of work, would they come to the same mark?

These two concepts are often in competition with one another. If I try to make an assessment more reliable, by introducing lots of multiple choice questions, I’d be reducing the validity. If I were to use an open investigation as a form of assessing physics ability it would arguably be more valid, but it would be very unreliable. The trade-off between reliability and validity is known as dependability. How dependable your assessment needs to be, and therefore how it should be structured and designed, depends on the purpose of the assessment.

Purposes of assessment

There are numerous classifications of assessment purposes, Paul Newton identifies 23 different purposes ranging from diagnostic to system monitoring, from accountability to screening. He acknowledges that this list isn’t exhaustive and also points out that sometimes it isn’t helpful to classify at this granular level. Day to day I am happy with a simple;

  • Assessment as learning
  • Assessment for learning
  • Assessment of learning

In the following three sections I look at why I want to do these assessments or what I want to get from them, and then what this actually looks like in practice.

Assessment as Learning

Why —Through the act of retrieval, or recalling information to mind, our memory for that information is strengthened and forgetting is less likely to occur. This doesn’t need to be highly reliable but should be valid.

First slide of every lesson

In practice — At the start of every lesson we have a quiz. Rather than have the questions written up beforehand I like to make them up on the spot. This is partly to reduce my workload but it also allows me to react to the girls responses. If I can see they struggle with a question, I might throw in a second similar questions. Questions can cover any topic learned to date.

Quizlet sets for self study

I also have a quizlet sets to cover the whole course that the girls may use to self study at home or we might use them in lessons. The Education Endowment Foundation’s recent guidance report on metacognition recommends explicitly teaching pupils metacognitive strategies (which includes retrieval practice) and how to organise and effectively manage their learning independently. I think this helps.

Assessment for Learning

Why — To teach effectively I need to know the limit of the students’ understanding, set an appropriate level of challenge and model my own thinking to help pupils develop their metacognitive and cognitive skills. Again, this needs to be valid but I’m not too concerned about reliability as I can ask questions regularly and repeat if necessary

In practice — My lessons are littered with multiple choice questions and worked examples. I prefer not to have the solutions prepared as it’s beneficial for the girls to see my thinking live. I might ask them to have a go first on their iPads or I might just ask them to watch the first time around. Leaving the slide blank gives me the room to adapt.

Slides for worked examples and MCQs

Most importantly I spend lots of time in lessons circulating, speaking to the girls, asking them questions and probing their understanding. This is where the vast majority of feedback takes place.

Assessment of Learning

When I think of assessment for learning, I like to think of it separately in my role as a classroom teacher and for someone with pastoral or academic responsibilities wider than a single class

Why (Classroom teacher) — As a classroom teacher I want to know what the girls can do and what they know. This helps me make a judgement of whether they have kept up with the work or if they are falling behind. I might also use summative assessment to help them close off a topic and motive them to work a little harder before moving on. This should be a little more reliable than the previous two purposes, however it is difficult to make a highly reliable test, especially to fit 30 minutes. It should be valid in that it should match the summative assessment they will see at the end of the course (GCSE), so we use past GCSE questions to act as preparation.

In practice (Classroom teacher) — I check they can do repetitive tasks, like graph drawing, by using checklists. This are stuck on their work and allow them to judge their work against set criteria. They then know how they have performed and what they need to do to improve. I also like to use TeacherToolkit’s Yellow Box to indicate where more work or a correction is needed.

Checklists for common tasks
Yellow box marking in practice

After each book mark I also give them the following checklist. This sits well with yellow box marking as it is immediately clear to me when I flick through their books whether or not they have responded to my previous marking.

Book marking checklist

The criticism I can hear coming is that I’m not promoting progress or I’m not helping them improve. But as I made clear, the purpose of my summative marking is to check they are working and to check they aren’t falling behind. Feedback takes place, for me, during lessons.

Test feedback

As a department, we still do topic tests to act as a motivator for the girls and for me to check they are keeping up. Once I have marked the test, I go through looking for areas that need addressing and give the girls 2–4 feedback questions to complete on those specific areas. I don’t like to use item level spreadsheets for this as there might be multiple reasons a student performs badly on a question that isn’t made clear by a simple mark.

Why (Wider responsibility) — Someone with wider responsibility may want to identify if there are any issues for girls in particular subjects or to assess whether there are trends in performance that cant be picked up by a single teacher. They may also want to report on attainment to parents and other parties.

In practice (Wider responsibility) — Classically, schools like to report on test percentages or grades. These are both problematic however this isn’t the post for me to do it justice. I will , however, be going into this background in my talk at the Hampshire collegiate Teaching and Learning Conference.

Whole school summative assessment

I prefer to view students results graphically in a bee swarm chart. The black swarm represents the cohort and the red cross represents an individual students results. This is produced using the more reliable end-of-year exam results. The assumption here is that the cohort as a whole is making progress year on year (our exam results show this is true). This graph allows me to see if a student is moving within their cohort. This is an incredibly blunt tool, and I am just looking for big changes that deserve a conversation. The only thing needed to produce these graphs are the results of the test, teachers don’t need to guess at grades and there’s no attempt here to quantify progress.

Although there’s certainly nothing groundbreaking in this post, I hope it is of some use to someone. I am open to comment/critique/criticism so please get in touch via the comments or on twitter.