Photo by @chairulfajar_ on Unsplash

Recognizing the Noise All Around Us

Philip Rogers
A Path Less Taken
Published in
16 min readApr 10, 2023

--

From time to time, I start reading a book, and among my first reactions is something like “I wish I had discovered this book well before now.” The book Noise: A Flaw in Human Judgement, by Kahneman, Sibony, and Sunstein, has proven to be such a book. And I would be remiss if I did not thank my wife for knowing me better than I know myself, and picking this off the shelf from among the countless options that were available to her at the time at the book store!

Distinguishing Between Bias and Noise

The analogy that the authors use to differentiate between bias and noise is both simple and effective. They posit that there are four teams (A, B, C, and D), each team composed of five people, where:

· They are all aiming at a target the same distance away

· They each get one attempt at the target

· The target has three rings

· Hitting as close as possible to the center of the target is the goal

To summarize the results for the teams, they fall into the following groups, to which I’m assigning the following labels:

Affinity

  • Tight grouping. The attempts are clustered closely together (you could put your fist on the target and cover all of the holes)
  • Scattered grouping. The attempts are broadly dispersed (you could put both hands on the target and still not cover all of the attempts)

Positioning

  • On target. All attempts are in the center ring
  • Off target. Most attempts are outside of the center ring

Based on those definitions, the result for the four teams are as follows:

  • Team A. Tight grouping, on target (all attempts close together; in the center ring)
  • Team B. Tight grouping, off target (all attempts close together; none in the center ring, but instead on the lower left side of the target)
  • Team C. Scattered grouping broadly distributed, mostly off target (no attempts close together; one attempt near the center, with the rest scattered in all directions across different parts of the target)
  • Team D. Scattered grouping more narrowly distributed, mostly off target (no attempts close together; one attempt near the center, with the rest scattered, but only on the left side of the target)

Coming back to bias and noise, based on the patterns described above, we can say the following about the results for the four teams:

  • Team A. Accurate — No bias, no noise. All attempts systematically on-target; AND, we can say with confidence where the team’s next attempt will be located.
  • Team B. Biased, no noise. All attempts systematically off-target; AND, we can still say with confidence where the team’s next attempt will be located.
  • Team C. No bias, noisy. All attempts are scattered; AND, we could not predict with any confidence where the team’s next attempt will be located.
  • Team D. Biased, noisy. All attempts are scattered; AND, we could only make a general prediction, that the team’s next attempt would fall somewhere on the left side of the target.

The authors observe the following, based on this data:

“The shooting range is a metaphor for what can go wrong in human judgement, especially in the diverse decisions that people make on behalf of organizations… Some judgements are biased; they are systematically off target. Other judgements are noisy, as people who are expected to agree end up at different points around the target. Many organizations, unfortunately, are affected by both bias and noise.”

For the purposes of making the distinction between bias and noise, the authors extend this thought exercise by having us flip the targets over. And, it’s important to point out, let’s assume that the rings of the target are completely invisible to us, so all we see on the back of each target are holes, with no reference point as to whether something is on or off target. Thus, just based on the holes on the back of the target alone, it’s still possible to deduce that Teams C and D are noisy, while Teams A and B are not. Thus a “… general property of noise is that you can recognize and measure it while knowing nothing about the target or bias.”

Both bias and noise affect the decisions that organizations make. However, as the authors point out, bias gets most of the attention, with scores of books, scientific papers, blog posts, and podcasts related to it. By way of contrast, little is said about noise. This glaring imbalance is particularly significant because when it comes to errors in judgement, both bias and noise are almost always present, and in some instances, noise will make a greater contribution to error than bias does.

Examples of Noisy Decision Domains

The authors go on to describe examples of domains where noise is often particularly high, even though the human and societal cost with respect to any lack of consistency with decision-making can be quite high.

  • Health care. Given the same patient, one doctor might often make a different judgement about root cause from another, not to mention any possible next steps that might be appropriate for the patient. And even in areas where it might be tempting to assume that there is minimal noise, such as analysis of X-rays and other scans, noise is often present.
  • Forensic science. Just based on what is portrayed in many television shows, it might appear that it’s an “exact science” when it comes to making forensic decisions, but the reality is considerably more complicated. Not only are differences in interpretation seen from one forensic practitioner to another, but also, it has been observed that the same practitioner might make a different decision when looking at the same data at a later time, for example, a set of fingerprints. And even in some areas where it’s assumed that there is little potential for error, such as DNA analysis, who is doing the analysis, and when they are doing it, makes a difference, not to mention the collection and storage regimen associated with such samples.
  • Child custody. There is considerable variability across the recommendations that case workers might make, with respect to the likelihood that one case worker would recommend placement of a child in foster care, in comparison with one of their peers if they were working on the same case.
  • Asylum. To a huge extent, the relatively likelihood that a particular asylum-seeker will be allowed to stay in the United States depends on which judge hears their case.
  • Bail. In much the same way as we see with asylum decisions, the relative likelihood that a particular defendant will be granted bail varies considerably based on many factors, where one of the most important factors is which judge is making the decision.

Other examples of areas where noise is commonplace is in business contexts such as the following:

  • Products. Decisions about whether to start a new product, or enhance or terminate an existing product, are influenced by many factors, and the human element is prominent, where given the same information, one person/group might make a significantly different recommendation from another.
  • Patents. In much the same way that who the judge is can have a big impact on on an asylum or bail decision, the same has been observed with patent applications. Given the same patent application, one patent examiner might grant a patent where a different examiner might reject it.
  • Personnel. It will likely come as little surprise to anyone who has interviewed for a job that different interviewers may have divergent views about any given candidate, AND, depending on the decision-making process, after the interview is over, interviewers can be and often are influenced by what other interviewers say, often depending on who speaks first. Similar challenges are observed with respect to performance review and promotion decisions.
  • Forecasts. Forecasts of all kinds are subject to error, and often, the “experts” don’t do much better than the average person. And, some disciplines are more prone to error than others when it comes to making forecasts. (I’ll have more to say about this with respect to software development in particular in a later blog post.)

In summary, “all these noisy situations are the tip of a large iceberg. Wherever you look at human judgements, you are likely to find noise. To improve the quality of our judgements, we need to overcome noise as well as bias.”

A Set of Assertions Regarding Human Decision-Making

The following set of assertions can inform much of what we say about human judgement and the possibility for inconsistency and error:

  • In most situations requiring any form of professional judgement, there is variability observed in the decisions that are made
  • The extent to which there is variability is much greater than we might think, and much of that variability is due to noise
  • There is more that we can do to reduce noise than we might think

It’s important to recognize that not only is some variability to be expected, but also that variability in some forms of judgement is helpful. Consider reviews of books or films, where the judgements rendered can and do vary significantly. Many people find it helpful to read the opinions not only of “experts,” but also the opinions of people who they might feel are much more similar to them.

If we turn our attention back to areas where significant differences in judgement are NOT desirable, and can even be harmful — such as the domains described above — one thing we need to recognize is that we tend to operate under what the authors refer to as “assumption of agreement,” and what others might call “naïve realism.”

“In the case of professional judgements, the belief that others see the world much as we do is reinforced every day in multiple ways. First, we share with our colleagues a common language and set of rules about the considerations that should matter in our decisions. We also have the reassuring experience of agreeing with others on the absurdity of judgements that violate these rules. We view the occasional disagreements with colleagues as lapses in judgement on their part. We have little opportunity to notice that our agreed-on rules are vague, sufficient to eliminate some possibilities but not to specify a shared positive response to a particular case. We can live comfortably with colleagues without ever noticing that they actually do not see the world as we do.”

Case Study: The Insurance Industry

The authors give an example from the insurance industry which helps illustrate the nature of the challenge. Underwriters routinely make decisions about what the appropriate insurance premium payment should be for a given case. The leaders of insurance firms may very well believe that while there is some variability with respect to a decision on the premium payment amount that one underwriter might make versus another, the amount of variability would be “reasonable,” that is, within a reasonably narrow range. And the underwriters themselves tend to make the same assumption, that their judgment might not be identical to that of their colleagues, but that it is reasonably similar. However, when looking at the data about such decisions in the aggregate, the amount of variability that exists tends to take people by surprise. One way to refer to this form of analysis, which the authors use, is as a “noise audit.”

The authors elaborate as follows:

“How could professionals in the same role and in the same office differ so much from one another without becoming aware of it? How could executives fail to make this observation, which they understood to be a threat to the performance and reputation of their company? We came to see that the problem of system noise often goes unrecognized in organizations and that the common inattention to noise is as interesting as its prevalence. The noise audits suggested that respected professionals — and the organizations that employ them — maintained an illusion of agreement while in fact disagreeing in their daily professional judgements.”

A term that came to me as I read the authors’ analysis of the insurance firm is “decision drift.” Consider the example of an underwriter who has been doing the job for a period of years. Like their colleagues, that underwriter is likely to become more confident in their judgement the longer they spend on the job. What tends to occur over time, however, is that any given underwriter is increasingly likely to base their decisions on what decision they personally made for a case they view as similar. Early in their career, they would have been far more likely (and even expected to) confer with a colleague, but the longer they stay on the job, the higher the likelihood that they are basing decisions entirely on their own case history. Let’s say that a firm has 50 underwriters; it’s not difficult to see how the decisions that different underwriters might make could vary considerably, which has implications not only for individual customers, but also for the insurance firm as a whole.

“The psychology of this process is well understood. Confidence is nurtured by the subjective experience of judgements that are made with increasing fluency and ease, in part because they resemble judgements made in similar cases in the past. Over time, as this underwriter learned to agree with her past self, her confidence in her judgement increased. She gave no indication that — after the initial apprenticeship phase — she had learned to agree with others, had checked to see to what extent she did agree with them, or had even tried to prevent her practices from drifting away from those of her colleagues.

For the insurance company, the illusion of agreement was shattered only by the noise audit. How had the leaders of the company remained unaware of their noise problem? There are several possible answers here, but one that seems to play a large role in many settings is simply the discomfort of disagreement. Most organizations prefer consensus and harmony over dissent and conflict. The procedures in place often seem expressly designed to minimize the frequency of exposure to actual disagreements, and, when such disagreements happen, to explain them away.”

Definitions of Terms: Measurements and Judgements

Now seems like a good time to share paraphrased versions of the authors’ definitions of terms that are important to understand within the broader context of decision-making and the noise that exists when evaluating those judgements in the aggregate.

  • Measurement. Using an instrument to assign a value on a scale to an object or event.
  • Judgement. A measurement where the instrument is the human mind; they informally join disparate pieces of information as part of an overall assessment.
  • Predictive judgement. A form of judgement where the desired output comes close to a true value. (Corollary: Given a set of two or more such judgements, only one can be correct.)
  • Verifiable judgement. A form of judgement that can be scored by an objective observer based on a simple measure of error; it will ultimately become clear whether it was accurate or not.
  • Evaluative judgement. A form of judgement where there is a need to either: a) select from among a range of options (e.g., assigning a premium amount to an insurance policy, assigning an award amount to a grant, or setting the length of a criminal sentence); or b) consider trade-offs when evaluating options (e.g., whether to hire one candidate vs. another, or choosing how to respond to an epidemic or pandemic).
  • Judgement efficacy. How well a judgement process performs when applied to a large number of cases.
  • Bounded disagreement. Matters of judgement live in a grey area, between matters of computation (where no disagreement is allowed, by definition), and matters of taste, where it’s generally accepted that some disagreement will exist.

Notable quotes:

“Focusing on the process of judgement, rather than its outcome, makes it possible to evaluate the quality of judgements that are not verifiable, such as judgement about fictitious problems or long-term forecasts. We may not be able to compare them to a known outcome, but we can still tell whether they have been made incorrectly.”

“The boundary between predictive and evaluative judgements is fuzzy and people who make judgements are often unaware of it. Judges who set sentences or professors who grade essays think hard about their task and strive to find the ‘right’ answer. They develop confidence in their judgements and the justifications the have for them. Professionals feel much the same, act much the same, and speak much the same to justify themselves when their judgements are predictive (‘How well will this new product sell?’) and when they are evaluative (‘How well did my assistance perform this year?’).”

Definitions of Terms: Errors, Bias, and Noise

Here are some additional definitions related to error, bias, and noise:

  • Error. A deviation from the average. (“We can be sure there is error if judgements vary for no good reason.”)
  • Bias. A skew in the results for judgements of the same problem that follows an observable pattern, where most errors are in the same direction. (It can be seen as the average error, e.g., “when executives are too optimistic about sales, year after year; or when a company keeps reinvesting money in failing projects that it should write off.”)
  • Noise. Variability in results for judgements of the same problem that does not follow an observable pattern, and where the judgements can reasonably be expected to be identical. (It can also be seen as any errors that remain after bias has been removed, e.g., it’s often manifested as system noise (see definition below), where organizations can choose from a pool of professionals to make a decision, “such as physicians in an emergency room, judges imposing criminal penalties, and underwriters in an insurance company.”)
  • Level noise. Variability in results for judgements of the same problem, where the variability may be associated with the values or pre-dispositions of the decision-maker. (That is, “the variability of the average judgements made by different individuals.” Example: A judge who is known for imposing particularly harsh or particularly lenient sentences.)
  • Pattern noise. Variability in results for judgements of the same problem, where the variability may be associated with a particular judgement scenario or with the particulars associated with a particular case. (That is, it’s often manifested as “principles or values that the individuals follow, whether consciously or not.” Example: A judge who imposes a sentence that constitutes an outlier relative to their typical sentence for cases of a similar type, where even if their judgements overall are relatively harsh, they might be lenient with traffic offenders.)
  • System noise. The sum of level noise and pattern noise, or, to be more precise, System Noise squared equals the sums of Level Noise squared and Pattern Noise squared.
  • Occasion noise. Variability in results for judgements of the same problem, where the variability may be associated with a person’s personal circumstances at the moment of decision, due to factors such as what mood they’re in, how well-rested they are, what time of day it is, what the weather is like, and whether they are/are not hungry. (It’s a transient component of pattern noise, “most easily recognized when the [person] does not recognize the case as one seen before,” and where they might make a different decision than they did on a different occasion for the same case. Example: A judge has gotten into a car accident on the way to work, had to skip breakfast, and didn’t get much sleep the night before, any one of which can affect their decision-making).
  • Noise audit. A type of experiment where individuals made separate and independent judgements of the same cases. “We can measure noise without knowing the true value, just as we can see, from the back of the target, the scatter of a set of shots… They may sometimes call attention to deficiencies in skill or training. And they will quantify system noise — for instance, when underwriters in the same team differ in their assessments of risks.”
  • Mean of Squared Errors (MSE). A long-time standard in scientific measurement, the main features of which are that it: 1) produces a sample mean as an unbiased estimate of the population mean; 2) gives positive and negative errors equal weight; and 3) penalizes large errors. “As measured by MSE, bias and noise are independent and additive sources of error.”

Notable quotes:

“Of bias and noise, which is the larger problem? It depends on the situation. The answer might well turn out to be noise. Bias and noise make equal contributions to overall error (MSE) when the mean of the errors (the bias) is equal to the standard deviations of errors (the noise). When the distribution of judgements is normal (the standard bell-shaped curve), the effects of bias and noise are equal when 84% of judgements are above (or below) the true value. This is a substantial bias, which will often be detectable in a professional context. When the bias is smaller than one standard deviation, noise is the bigger source of overall error.”

“There is a limit to the accuracy of our predictions, and this limit is often quite low. Nevertheless, we are generally comfortable with our judgements. What gives us this satisfying confidence is an internal signal, a self-generated reward for fitting the facts and the judgement into a coherent story. Our subjective confidence in our judgements is not necessarily related to their objective accuracy.”

Definitions of Terms: Dynamics in Group Decision-Making

In a group decision-making setting, the following additional definitions apply:

Noise amplification. The appearance of any one of the following factors, which can introduce variability into a group’s judgement of the same problem:

  • Social influence. The extent to which the members of the group are exposed to the judgements of another individual or group, before making their own judgement. (Example: Whether or not the members of a group choosing the best and worst songs see the opinions rendered by another individual or group before making their judgement).
  • Informational cascade. The extent to which the order in which a member of the group gets a chance to offer their opinion affects the opinion that they give. (Example: The first person in a group who gives their opinion about a potential job candidate is very likely to affect the opinion given by those that follow them; conversely, the last person in the group is relatively unlikely to give the same opinion as they would have if they had gone first).
  • Group polarization. The extent to which a pre-existing view moves further in the same direction when hearing similar points of view expressed by others in the group. (Example: The members of a team who need to make a recommendation on the location for a new office, and who start off in general agreement, settle on a particular location as not just “good enough,” but in fact, far better than all of the available alternatives, even though few if any of them started the conversation with that opinion).

Notable quotes:

“Groups can go in all sorts of directions, depending in part on factors that should be irrelevant. Who speaks first, who speaks last, who speaks with confidence, who is wearing black, who is seated next to whom, who smiles or frowns or gestures at the right moment — all these factors, and many more, affect outcomes. Every day, similar groups make different decisions, whether the question involves hiring, promotion, office closings, communications strategies, environmental regulations, national security, university admissions, or new product launches.”

“In simple estimation tasks — the number of crimes in a city, population increases over specific periods, the length of a border between nations — crowds were indeed wise as long as they registered their views independently. But if they learned the estimates of other people — for example, the average estimate of a group of twelve — the crowd did worse… The irony is that while multiple independent opinions, properly aggregated, can be strikingly accurate, even a little social influence can produce a kind of herding that undermines the wisdom of crowds.”

Conclusion

I hope that this overview of foundational concepts, based on the book Noise, proves to be as fascinating for you as it has been for me. It’s my intention to discuss additional points from the book, and also the practical application of these ideas, in one or more subsequent blog posts.

And the first of those is now available:

--

--

Philip Rogers
A Path Less Taken

I have worn many hats while working for organizations of all kinds, including those in the private, public, and non-profit sectors.