I recently devoured “Thinking Fast and Slow” by Daniel Kahneman.
It describes two systems of thought that drive our behaviour — the impetuous, intuitive System 1 and the rational, lazy and structured System 2.
As a very short and inadequate précis of Kahneman’s work: the book demonstrates that while we believe we are governed by our rational mind, our intuitive mind often dominates our decision making — and it’s often wrong.
I love the book.
Everyone seeking to understand decision making, the human mind, economics, society or the human condition should read this book and digest its lessons.
One of my favourite anecdotes comes from Kahneman’s time working with pilot instructors for the Israeli Air Force. He tells them that positive feedback works much better than negative feedback, an observation hotly disputed by the senior instructors. “When we yell at pilots, they get better,” they say.
Kahneman realised that they are confusing cause with correlation and that they don’t understand the statistical inevitability of what is called ‘reversion to the mean’. Pilots will have good days and bad days and will oscillate between the two, regardless of the instructor’s input. If they yell at them when they have a bad day, then the chances are that the next day will be good and the instructor’s confirmation bias in their own effectiveness will be reinforced.
To illustrate his point, Kahneman asks the instructors to play a game. He has them stand with their backs to a target on the floor and toss two coins over their shoulder at the target. They measure the distance to the target on each throw and compare the results. It is immediately obvious that those who did poorly on their first throw improved on their second and those who did well deteriorated on the second attempt — revision to the mean. No amount of shouting would have changed the outcome.
And despite being a largely theoretical text Kahneman offers a number of practical suggestions.
One highlight of the book for me was Kahneman’s recommendation around the use of models or algorithms in decision making. Because our intuition is subject to bias and error, Kahneman suggests using algorithms in critical decision making.
The example he gives is of the “Agpar score”, used to quickly assess the health of an infant upon birth. It was developed and tested in 1952 by Virginia Agpar, an anesthesiologist at Sloane Hospital in New York.
The Agpar score is used to quickly assess the health of a newborn and determine if they need immediate medical care. It uses a simple 3-point scale with four categories to produce an overall score which determines the outcome.
- Complexion: blue or pale all over=0; blue at extremities=1; body & extremities pink=2
- Pulse rate: absent=0; <100=1; >100=2
- Reflex irritability: no response=0; grimace/feeble cry=1; cry or pull away=2
- Activity: none=0; some flexion=1; flexed arms and legs=2
- Respiratory Effort: absent=0; weak/irregular=1; strong, lusty cry=2
A score above 7 is normal, 4–6 is low and 3 or under is considered critically low, requiring medical attention. Before the Agpar score, nurses and doctors were left to their own judgement and experience to make flawed and intuitive decisions.
The Agpar score is still in use today in hospitals the world over and why the model is not in wider use is beyond me.
Kahneman observes that there always has been and always will be considerable opposition to anything which is perceived as undermining the authority of skilled and experienced decision makers (see the pilot instructors above).
But the evidence in specific examples is overwhelming: algorithms make consistently better judgements than humans (algorithms won in 60% of studies and scored a draw with expert decision making in the other 40% but at cheaper cost).
Kahneman specifically suggests using algorithms in the final decision.
Personal evaluations are okay for the inputs where some subjective consideration is called for or it is not possible to measure or test something. But because our intuition is flawed we need to remove it from where it’s weight will have the most conclusive bias — the final decision.
This resonates strongly with me.
Over the years I’ve taught many new managers to interview potential hires for job roles. And time and again I see them making the same mistake.
While we’re in the interview room, a candidate’s inadequacies are painfully obvious in their answers to questions but once we step outside to review, a mystical and positive lustre descends upon their performance.
My first question to new interviewers is always “What do you think? What does your gut tell you?” and their first answer is (nearly) always “I’m not sure — what do you think?” This is followed by a list of positives and then, at my prompting, a grudging admission of faults. The result is not a carefully weighted analysis of pros-and-cons but a sympathetic whitewash. What I learnt via painful experience and what I tried to teach is that if you leave the interview with anything less than elation at the thought of hiring the individual, you are opening yourself to a world of hurt (read about Hubspot’s hiring process as an example).
Replacing that final decision with an algorithm makes eminent sense.
Further, asking for simple scores on a range of characteristics from each interviewer separately and then tabulating the scores seems like genius. It eliminates (or counteracts) individual bias and allows the interview process to be fine tuned by evidence: if the categories you pick don’t identify star hires, change the questions.
Other possibilities for Agpar style algorithms suggest themselves.
- Considering investing in a startup or stock? Figure out your criteria and apply the score test to each of them before deciding which to buy. Remove your intuition and emotion from the investment decision.
- Want to methodically assess and improve the progress of your change effort or reorganisations? Score it based on categories like resistance, awareness and risk. Experiment, measure and repeat.
- Want to estimate the effort involved in building something new (like software)? Assess the effort based on categories like complexity, understanding and familiarity. Evaluate the outcomes and refine your model.
Several suggestions however before you start.
Resist the temptation to over-complicate.
Don’t use more categories than necessary (six is more than enough) and do not over complicate the range of each variable. The Agpar uses a simple 0,1 or 2 point score for each category to form the overall 0–10 score. Widening the range available in each category gives you an illusion of precision where their is none. In the Agpar score each number is defined to a (reasonably) distinct set of symptoms, making categorisation easy.
Also, avoid weighting the categories unless you have evidence that supports it. By introducing arbitrary weighting to categories you not only reintroduce unnecessary bias but you leave the algorithm open to manipulation as people who know the model can adjust their scores to influence the final outcome.
If you absolutely must have a more complex model then you should find a way to statistically test it, otherwise you risk simply codifying your innate human bias.
And we have enough of that bias already.
Enough to last a life time.