Statistics for all: What the heck is confidence?
I doubt there is a person in the Western world over the age of 4 who hasn’t taken a psychological or educational test. Yet very few of us know one of the most important facts about these tests — their scores are always imprecise.
When you measure the height of a child, you can be pretty confident that the measurement you make is correct within a fraction of an inch on either side. And if you check the time on your mobile phone, you can be pretty certain that it is accurate within a tiny fraction of a minute on either side. Rulers and clocks are well-calibrated measures that we can use with great confidence if we use them correctly. The same is true of measures of temperature, speed, frequency, and weight.
But even measurements made with these metrics are more or less precise. They are correct within a range. These ranges are called confidence intervals. The confidence interval around the measurement of a child’s height would be expressed as something like “82 centimeters plus or minus 1/2 of a centimeter.” Statisticians would say that the child’s true height is likely to be somewhere in this range.
Scores on educational and psychological tests have confidence intervals too. But there’s a difference between these confidence intervals and those for physical measurements. The confidence intervals around scores on psychological and educational tests are larger than the confidence intervals around measurements in the physical world. How much larger? Let’s look at some examples.
The red scale on the left represents a cheap, inaccurate thermometer, with a 100 point scale, showing a temperature of 75º and a confidence interval of plus or minus 3º.
The teal scale in the middle represents the SAT, an educational test made by ETS for use in college admissions. For high stakes tests like the SAT—tests used to make decisions like who gets to go to which college — test developers set the highest standard. In the example, this standard would allow us to to claim that we’re confident that a person’s true score is 75 — give or take 9 points (out of 100).
The third scale, on the right represents the best-performing scale on a popular 360 assessment. It’s confidence interval is more than twice as large as the confidence interval around SAT scores. In this case we would be able to claim that a person’s score is 75 points, give or take 20 points (out of 100).
Measurement precision matters. In some cases, even a relatively small confidence interval can make the difference between life and death.
The figure on the left is the SAT example used above. Imagine that you’re using a thermometer with the same precision as the SAT. Your daughter is ill, so you take her temperature. It’s 98.6 give-or-take 9 degrees. That’s pretty much the range between hypothermia and severe delirium. Clearly, this thermometer is not precise enough to be relied upon when taking someone’s temperature.
The point I want to make here is not that measurements in the physical world are more precise than measurments in psychological and educational assessment (though this is true) or that our psychological and educational assessments are useless (though this is too often true). My point is that we should not be using psychological or educational assessments without being aware of their level of precision. The bigger the confidence intervals around their scores — the more careful we need to be about the kinds of decisions we make with them.
When it comes to educational and psychological assessment, I think we’re far too careless. Too many people who buy and use assessments don’t know enough about statistics to make well-informed assessment decisions.
Fortunately, I believe we can remedy this! And it seems to me that the best place to begin is with confidence, so, in the next article in this series I’m going to share a super-easy way to figure out how much confidence you can have in any test’s scores.
My organization, Lectica, Inc., is a 501(c)3 nonprofit corporation. Part of our mission is to share what we learn with the world. One of the things we’ve learned is that many assessment buyers don’t seem to know enough about statistics to make the best choices. The Statistics for all series is designed to provide assessment buyers with the knowledge they need most to become better assessment shoppers.