So, in my last post I defined expertise as the set of acquired factors that contribute to success in a domain, and waved my hands at refining that (to, for instance, something about “mental” factors). Let’s dig in a little bit on the latter part — “success in a domain.”
What counts as “success in a domain?” Some researchers (Fernand Gobet, for instance) look at expertise as getting “results that are vastly superior to those obtained by the majority of the population,” but that raises the question of which results we care about.
Chess is the most-studied domain in expertise research, and in that research the accepted practice is to use Elo ratings as measurements of expertise — which implies that Elo ratings are tracking the results we care about.
The Elo rating system is a statistical model for rating the performance of chess players based on the outcome of individual games. If I’m playing the Pope in a game and he wins, then he’ll gain rating points and I’ll lose points. If his Elo rating is 200 points higher than mine, he’s specifically predicted to win (around) 3 out of every 4 games we play. (Note that there are a lot of intricacies to implementing the system; if you’re interested, the Wikipedia article is a decent place to start)
So, there’s no problem, then, right? If the goal of chess is to win the game, and Elo ratings are based on games won and lost, we’re good to go.
Or maybe not. Imagine that game between the Pope and me once more. We’re sitting, staring across the board at one another — but add in a really, really bad smell. Maybe I’ve smuggled in a live, blooming corpse flower without anyone noticing it, and it’s so extreme that the Pope is unavoidably distracted, misses an obvious move, and I win. And in fact, I do this in every game I play, and my Elo rating starts to soar as I defeat all comers.
Is it fair to say that I’ve acquired more expertise merely by the addition of a smelly flower to my arsenal? It seems not — but I’m definitely winning more, so maybe “winning” alone isn’t the key, and Elo isn’t the perfect expertise measure that we’d hoped for?
For a more realistic example: with the rise of computer chess in the last couple of decades, a number of chess players have cheated by using software to decide their moves during tournament games. If they hadn’t been caught (and I’m sure in some cases they weren’t), their Elo ratings would have improved without an improvement in their expertise.
So, if Elo ratings track wins and losses alone without regard to how the results were achieved, they don’t actually seem to be tracking expertise (at least, not necessarily). We need something further, where the results are achieved by techniques and actions in some sense “allowable” in the domain.
What precisely this means will vary from domain to domain, and even within domains over time and across locations, so I don’t think it necessary to dig in at this point. It is, however, something to keep in mind as we continue to think about quantifying expertise — what makes for superior results, and what constraints do we put on how those results are achieved?