Ed Tech, A.I., Feedback Loops & All of the above

First, I should explain how we got into this mess….

Tassomai is an adaptive learning program designed to boost students’ performance in science exams. It’s proving to be highly effective and has been adopted by a large number of UK schools over the past two years. The results that have come out of the software’s use have been fantastic, but we are aiming to keep improving on them each year.

When developing the content for Tassomai, it was crucial in my mind that we had questions with a range of difficulties, and that students would start off with easy material and work their way into tougher content.

Accessibility is key to what we’re trying to achieve, and a large part of that is to be able to encourage students who have no confidence in their science abilities to have a go — and show them that they can do well and build knowledge.

It works: students starting Tassomai tend to score highly over the first few days and keep going at it (often discovering a newfound confidence in the subject at the same time). Meanwhile, we can profile their abilities via the data they are generating, and set out their route through the rest of the content.

Part of our algorithm rates the difficulties of each question and updates the question’s difficulty each time it’s answered. Questions start with an initially neutral weighting (somewhere around 0.5) and then get pulled towards an easier or harder rating as students attempt them. This data-driven approach to rating our material’s difficulty allows us to serve genuinely easy introductory questions.

It also means that, in any given topic, we are able to ramp up the difficulty of questions for any given student based on their recent work in that area — and to do so with questions we know will provide a certain level of challenge. A strong student will quickly be able to access more tricky questions; a weaker student will spend more time on elementary and more explanatory material, and the difficulty will increase gently as they master the basics.

As well as designing an algorithm that could help any student reach a good grade in their science, we had to write the content to actually serve the users. The challenge here was to write material that could present itself as a question, while actually just being a means to encourage students to read and engage. Sure, some of the content is tough and ascertains the detailed knowledge of a student, but lots of it is elementary. Here’s an example:

It’s merely a nice way of getting a bit of information in front of the student in a fairly non-threatening way. Clearly, the student should pick “ALL of these” and move on. Now, we would also ask the question another way, where the “ALL of these” option is wrong and they have to pick the correct answer… we don’t want students just clicking on “ALL of these” every time they see it without reading the other text….

But herein lies the problem: when a question includes the option “ALL of these”, the chances of it being correct are about 50%, so students pretty quickly cotton onto the fact that it’s usually worth a punt — especially if they’re not sure. This has an interesting effect: if they are correct in their assumption, they get the question right, and the question’s difficulty gets downgraded. Now this “easier” question becomes more likely to appear to new students. If, however, they’re wrong, and “ALL of these” is not the answer, the difficulty is increased and is put out of reach for the next student. It’s a feedback loop that pushes the two types of “ALL of these” questions apart and keeps them there.

We did some analysis of the two types of “ALL of these” questions and their difficulties and produced a simple frequency graph. Questions where “ALL of these” is correct appear as the blue line with a positive skew; wrong answers show as the negatively skewed red line. The results confirmed our suspicions — but, if anything, the difference was more stark than expected:

We’ve always thought that this wasn’t the end of the world — that it was a problem we could explain away and told teachers to pass on to students that “ALL of these” is the right answer at first (all part of the gentle “on-boarding” process), but beware! Soon enough, it will catch you out…

But, when students in the early stages are confined to questions rated between 0 and 15 (on this difficulty scale), the probability of a new question having “ALL of these” as the correct answer is suddenly around 95%.

The concern raised by teachers to us was that students might start to feel that “ALL of the above” would also be the correct answer in an exam. If there’s a chance that our algorithm’s feedback loop could be causing a student to make an error in an exam, clearly this was something we had to rectify.

So we’ve made two crucial changes, the first of which is in the content development; as we transition from the legacy specification to the new content for next year, the ratio of questions where “ALL of these” is correct has moved from 50% to 25%.

“ALL of these” is now no more or less likely to be correct than any other answer in a question.

Secondly, we’ve broken these questions free from the algorithm’s difficulty rating feedback loop. From now on, any question containing “ALL of these” as a possible answer (and other similar variants) will have its difficulty randomly redistributed daily. Therefore, they cannot get pushed or pulled to either end of the question difficulty spectrum.

Questions where “ALL of these” is the right answer are no more likely to appear at the start of a course.

It’s been really interesting for us to see what happens when tens of millions of students’ quiz answers are fed into this sort of feedback loop — and it’s a salutary warning of how powerful such an effect can be.

Where we know we are having huge success is in the way our system is increasing engagement in science, giving teachers incredibly valuable feedback data, and boosting results nationally. We’re glad to have had the feedback from teachers and the opportunity to fix this anomaly — and we expect that little changes like this, and the constructive relationships we have with teachers around the country, will help us continue to learn and improve.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.