If we really want to get the most from algorithms, we need to rethink how we assess human intelligence

Published in

Enrique Dans

3 min readApr 30, 2023

IMAGE: A drawing of a person using a computer with a tree diagram on the screen and icons of several industries outside — IMAGE: Yasmin Dwiputri — Data Hazards — Project AI across industries — Better Images of AI (CC BY)

An interesting article in Venture Beat, “Why exams intended for humans might not be good benchmarks for LLMs like GPT-4", touches on an issue I think about every time I read something like “ChatGPT passes the exam for so and so”, and now the topic of much ill-informed discussion.

We’re surprised and perhaps fearful when an algorithm breezes an exam we would need to spend a lot of time preparing for. That’s understandable. But if we look at how an algorithm learns, it’s not so surprising: first of all, we are talking about algorithms trained with an enormous amount of information, practically everything that’s online, with a few obvious exceptions. Their developers make a special effort to separate the data they use for training and for subsequent testing. Nevertheless, the amount of data used in training is so enormous that it is very difficult to ensure that the examples used later to evaluate the test model are not somehow included in the training data. This sets up a problem, commonly known as training data contamination: since the algorithm’s memory is, in principle, very large and perfect (digital), the data included in its training set up questions that the algorithm always answers well, although it would be a mistake to expect the same from other data that are not…

If we really want to get the most from algorithms, we need to rethink how we assess human intelligence

Written by Enrique Dans