Could more tests be the solution to imperfect tests?
Most of us will, at some point in our life, be judged by an institution based on a test score. May it be a driving test, a standardized academic test like the SAT or the TOEFL, or a neuro-psychological test in a clinical setting. These tests have been proven to have reliable results, that is to say, results that are in accordance with the true capacity or quality assessed of the person taking the test. However, these tests also present a number of ethical issues, since they are often biased based on the identity of the test taker, as well as the test taker’s personality such as their response to stressful situations.
In other words, tests are everywhere, and they are mostly useful, but do they really represent the test taker to the eyes of the institution demanding results? Are they really enough to grasp the test taker’s rank among the population or on a scale? Do the results really represent us well enough for these test to be more useful than detrimental or discriminatory against atypical personalities?
Many before me have tried to answer this question by making tests seem mainly detrimental, I would like to take the inverse approach. I believe that tests are essential: they provide valuable information in order to understand the particularities of each person encountered in a specific setting in very little time, and therefore to permit humans to behave in accordance with the results. This is the definition of society in general: behaving with one another according to social rules between each two people.
It has been proven that one review, or one score, given by an experienced reviewer, is usually extremely similar to multiple reviews or score given by a multitude of inexperienced reviewers, amongst which the outliers cancel each other out. Going back to the main question: could this be the solution to test-bias? Could more and more tests be the solution to achieve the best accuracy and grasp the details of the facets of personal specificities?
No test is perfect, misevaluations can occur. Test results may reflect inequalities and discriminatory bias. But could they be compared to multiple imperfect reviewers? Tests are our only way of assessing some important human characteristics, and one of the best way to compare us to each other, but no one test in infallible at the task. Tests are sometime expensive, especially paper-based tests and one-on-one tests, but with the universal use of computers, they could be inexpensive even if multiplied! An extremely smart student with anxiety disorders may get a lower SAT score than his less educated comrade, the scores are not really comparable in that case but are still used to compare the two students by universities. But if 10 tests were taken, some in a situation of stress some not, maybe even with a suppression of the best and worst score for each test taker? What if some of the SAT-tests were oral, and others written, some interactive some anonymous. Each test is a powerful tool, and can not be ignored for their imperfections, but a group of multifaceted tests are truly more representative of the test takers, who are themselves multifaceted as well.