## Education 2.0

# Fixing Exams

## On Bias, Variance, and Fixing them

A statistician would describe an exam as an “estimator”, which is a process for calculating the value of something based on observed data.

In this case, the “observed data” is the candidate’s performance in the exam. For example, a student’s marks for Mathematics, Physics and Chemistry at the 2022 G.C.E. Advanced Level exam.

The “value of something” could be many things, from whether the student should be admitted to study Computer Science at Ruhuna University, or if Commercial Bank should hire her as a management trainee.

Upon closer inspection, the statistician would also say that exams are a “**biased** estimator” and an estimator with significant **variance**.

Bias and Variance are (in statistics-speak) the problem with exams. Let’s see what they mean in practice.

# Bias

Consider Amali and Binod. Both have Z-Scores of 1.9 for the same subjects: Mathematics, Physics and Chemistry. Both want to study Computer Engineering at Moratuwa University. But their similarities end there.

Amali attended an obscure village school in Ampara, the only in the Divisional Secretariat which offers “Science” subjects for the A. Levels. Binod, in contrast, attended a Supra-Grade school in Colombo’s Cinnamon Gardens.

Binod’s father is an alumnus of Moratuwa University, as are his two older brothers have since continued for PhD’s in the U.S.A. Amali, in contrast, is the first individual in her family to pass (let alone excel) at A. Levels, and she did so even though her teachers badgered her to do “Bio-Science” because “Maths is for boys”.

Now, hypothetically, let’s suppose we need to choose between Amali and Binod for one slot to Study C.E. at UoM. Who do we choose?

On the one hand, if we interpret the Z-Scores directly, we have to conclude that both are equally suitable, and we might need to toss a coin to pick one.

On the other hand, consider the following. Binod attended a top school in a privileged location with much better facilities than Amali. Binod also probably benefitted from his educated family. Finally, Binod likely, never faced the various prejudices that Amali faced. These factors would have given Binod’s score a boost and been a drag on Amali’s.

In this way, Exams are influenced by other factors beyond pure suitability for university; factors like location, family background and gender should not influence university admission. In other words, the results of exams are influenced or “moved” (up or down) by factors about which we don’t (or shouldn’t) care.

Statisticians refer to “moving” phenomenon as “Bias.”

# Variance

Consider this hypothetical, science fiction-inspired thought experiment. Suppose we use some newfangled machine to split the universe into 1000 parallel universes. And suppose we ask Binod to resit his A. Levels in each; i.e. he sits the same exam 1000 times.

Now, what will Binod’s results look like? Will all 1000 be the same? Or will they be different?

On the one hand, the results are likely to cluster around 1.9. On the other hand, they are likely to vary. For example, in one universe, Binod might have been tired, or unwell, or picked the “hard” question — and hence his score might be lower, say 1.4. In another, his chemistry paper might have reached a lenient marker, and as a result, he ends up with a Z Score of 2.1.

Statisticians call this phenomenon “variance”.

Variance doesn’t lead to outright injustices the way Bias does. Its problem is “false precision”. For example, University admissions might consider a student getting 2.1 significantly better than one getting 1.4. But as we saw, the same candidate could have achieved these wildly different scores.

# Fixing Bias

The ideal fix for Bias in Exams is to remove them from the process. For example, attending a school in remote Ampara would not be a comparative problem if all schools island-wide had the same teaching quality and facilities. And girls would not have to put up with prejudices if we had better-educated teachers and parents.

Hopefully, these ideal fixes will happen in the longer term. In the shorter term, there are some statistical fixes that we can perform.

For example, we already statistically correct regional inequalities by having district quotas for universities. The quota effectively gives underprivileged regions a boost. For example, Colombo and Ampara might have cut-off marks of 2.1 and 1.8, respectively, giving Ampara a “boost” of 0.3 points.

We should take this methodology further, giving “boosts” based on other criteria, like gender and economic status.

“Districts” are also too much of a “blunt instrument”. For example, many schools in Colombo (some very close to Cinnamon Gardens) are in far worse shape than the better schools in Ampara. We need to correct at a more granular level.

# Fixing Variance

The standard statistical fix for Variance is to take more samples. For example, in our hypothetical, parallel-universe thought experiment, the average of Binod’s 1000 Z-Scores has a lower variance than the variance of a single sitting.

Hence, a somewhat draconian solution to our variance problem is to get students to sit the A. Level exam multiple times. And since we don’t have the option of parallel universes, it will have to be different papers with different questions.

However, consider again the crux of the fix: “taking more samples”.

Now, consider Binod’s school. Being quite a large school, it probably had many students taking the A. Levels in the same subjects as Binod. Now, suppose we took all the Z-Scores of these students and averaged them. For a start, this combined Z-Score would have a much lower variance than Binod’s own Z-Score. But what would this combined statistic estimate?

Instead of estimating Binod’s abilities, it would estimate the abilities of all the students as a whole. Or, more simply, it would estimate the school. It does measure something else — but it does measure this something else much better.

Hence, what if we move away from using competitive exams to evaluate students to use them to evaluate schools?

The obvious follow-up questions is: How will we decide which students to admit to university? We can do this fairly elegantly.

First, based on the school’s overall Z-Score and the students’ preferences at the school, each school can be assigned a quota of slots in each relevant university program.

Second, we can pick students from within each school to fill this quota probabilistically in proportion to their individual Z-Scores.

Consider some of the beneficial consequences:

- Inter-Student competition is replaced with cooperation. Students can increase their odds of getting into university, not just by doing better themselves, but also by helping their peers.
- The reduction of individual competition would also reduce dependence on private tuition. There might be a new trend in “private tuition” of teachers and school officials. Or, more generally, new mechanisms for teacher and school development.
- Students will have more freedom to explore extra-curricular activities and curricular activities, which have less impact on exams but are nonetheless useful in the long term.
- Inter-Student competition would be replaced by Inter-School competition. There would be more pressure on schools to improve their facilities and teaching. A school quota would provide a lever to correct Bias across schools.

# Next Steps

This article is meant more as a qualitative introduction to ideas.

Hence, I have omitted many “quantitative” details (e.g., how schools are assigned quotas). However, the quantitative details are vital, as they help better understand consequences. I will follow up with these details.

In the meantime, please share your “qualitative” thoughts, concerns, questions, and comments in the meantime.