The Calculus of Decision Making
When making decisions in IT, we first must ask ourselves if we are asking the correct questions. Then, we must know if we have to ask questions about the questions? In Calculus, there is a concept of first order and second order derivatives. What is the “calculus” of our decision making processes when we ask questions in IT?
I recently heard an interesting story about an admin worker in a large organization named Rosie. For almost 10 years, Rosie would score in the low 3’s out of a 5 in her yearly performance evaluations. However, anytime a layoff discussion would come up, there would be a consistent cry — you can’t get rid of Rosie,or this place would fall apart. This was a paradox considering that Rosie, by the numbers, was just above average at best, yet she was tacitly invaluable. The employee survey that was used year in and year out was a very simple 1 to 5 ranking between employees where 5 was the highest ranking. One possible explanation to the Rosie paradox was that she was the victim of a cognitive bias called stereotyping. This particular organization was a pharmaceutical research company. In the aggregate, she was competing with scientists, pharmacists, and doctors. Doctor beats admin every time, right? One year, they decide to change up the employee survey. They decide to apply a level of psychometrics to the survey with multiple dimensions in the survey questions.
Psychometrics — The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and personality traits. Also called psychometry — American Heritage® Stedman’s Medical Dictionary
In simple terms, they ask a few more detailed questions. In this new survey, they add another dimension ranking the importance of certain individual traits. All of the sudden, Rosie’s evaluation this year turned out to be a high 4. When they applied a second order derivative analysis to the survey questions, Rosie’s performance turned out to be statistically more valuable to the company. In calculus, we might consider a car’s velocity as a first order derivative, and then calculate acceleration as a second order derivative. We could say metaphorically, Rosie’s original low 3 rating was a first order derivative, and how valuable she was to the organization as a the second order derivative. It turns out by adding questions about the questions into to the decision making process, they uncovered the hidden truth explaining why everyone already knew that Rosie was invaluable.
Simon Wardley is an IT industry thought leader who often talks about disruption and innovation. In his 2015 OSCON keynote, Situation Normal, Everything Must Change, he used a brilliant analogy explaining how disruption comes from unforeseen events or changes. He describes a world where chess is played without a chess board. Imagine a futuristic society where they only find a partial chess set with just the the pieces and only parts of the instructions.
This future society learns how to play chess without knowing the board exists. Basically, they keep track of each players moves via a system of box and wire diagrams, as seen on the right in the image above. A pawn moves two spaces, then knight moves, and so forth and so on. Eventually, in this board-less game, a king is captured and someone wins the game. Over time, certain winning strategies emerge. The experts write books on top strategies for winning at “board-less” chess. Then, one day someone finds a complete chess set with the board and the complete instructions. They show up for a chess tournament and wind up winning a match in two moves. This is explained as a black swan, and it is called the famous “Check Mate in Two” move.
The idea of looking for second order derivatives was an underpinning of Micheal Lewis’ Moneyball. Going into the 2002 baseball season, the Oakland A’s had the third lowest salary total in all of baseball. Billy Bean was the A’s general manager at the time. Bean was considered a great talent recruiter that couldn’t retain his talent due to a stingy owner. At the end of the 2002 season, he lost a number of superstar players to large market teams with much higher salary budgets. One in particular was Johnny Damon, who was leaving the A’s to Boston for a 7.2 million salary. In 2002, the A’s salary budget was just under 40 million, and the top 4 teams in baseball were over 100 million with Boston being the second highest at 108 million. Bean needed to ask different questions about individual baseball player performances in order to be competitive with this large market teams. He needed to find baseball’s missing chess board.
At the time, the tacit knowledge of baseball was that a player’s offensive performance was primarily evaluated by hits, home runs, and battling averages. Bean’s chess board turned out to be something called Sabermetrics. Sabermetrics was the brain child of a man named Bill James. James was a Korean war veteran with a degree in economics who worked night shift security at a pork and beans cannery. James was also a baseball fanatic. James’ writings were basically ignored by professional baseball for many years because it was thought to be too unusual and unsuitable for normal baseball readers. I guess you couldn’t blame them, one of his derived metrics was called Pythagorean Expectation — a metrics on how many games a team should have won based on the number of runs they scored and allowed. James kept all sorts of rich second order statistics, specifically ones not recognized by professional baseball, in a publicly available database. As it turns out, a simple metric that the tacit baseball knowledge evaluators were not tracking was something called On Base Percentage (OPB). OBP was a metric that included how many times someone walked (four balls vs a hit or 3 strikes). In other words, OBP is a statistic that tells you the real number of how often a player actually gets on base. In 2002, the A’s won 103 games with a 39 million salary total while the New York Yankees won the same amount of games with a salary total of 125 million. Bean fielded a team that cost over half a million dollars less per win than the Yankees. Bean took advantage of a number of James’ second order data, but OBP stood out as a game changer.
By the end of the 2001 season Scott Hatteberg was let go by the Boston Redsox. Hatteberg was basically through with baseball, primarily due to his knees. He had been a catcher for many years and his time was up. However, Bean noticed that Hatteberg’s OBP five year average was higher than Damon’s by 12 percentage points. Hatteberg actually got on base 12 percent more than Damon from 1997 to 2001. Bean offered Hatteberg a 700k salary as a first baseman in 2002, one tenth of Damon’s 7.2 million with Boston. In a sense, Boston, although they didn’t know it, traded Hatteberg for Damon. By the end of the 2002 season, Hatteberg’s got on base 18 percent more than Damon did, and the A’s only paid one tenth of the cost. In the Moneyball movie, there is a great quote from the Boston Redsox owner character playing John Henry:
For forty-one million, you built a playoff team. You lost Damon, Giambi, Isringhausen, Pena and you won more games without them than you did with them. You won the exact same number of games that the Yankees won, but the Yankees spent one point four million per win and you paid two hundred and sixty thousand. I know you’ve taken it in the teeth out there, but the first guy through the wall. It always gets bloody, always. It’s the threat of not just the way of doing business, but in their minds it’s threatening the game. But really what it’s threatening is their livelihoods, it’s threatening their jobs, it’s threatening the way that they do things. And every time that happens, whether it’s the government or a way of doing business or whatever it is, the people are holding the reins, have their hands on the switch. They go bat shit crazy. I mean, anybody who’s not building a team right and rebuilding it using your model, they’re dinosaurs. They’ll be sitting on their ass on the sofa in October, watching the Boston Red Sox win the World Series — IMDB
In fact, in 2003, the Redsox started playing Moneyball, and in 2004, they won their first world series in 86 years breaking the famous “Curse of the Bambino”, a reference to a curse that started the year the Redsox trade Babe Ruth to the Yankees. Both the A’s and the Redsox better understood something Daniel Kahneman, in his book “Thinking Fast and Slow”, summarizes as cognitive bias. Baseball, in general, was guilty of both the Availability Heuristic and Illusion of Validity. Kahneman’s central thesis was that there are two modes of thought. System 1 which is fast, instinctive, and emotional and System 2 which is slower, more deliberative, and more logical. Prior to the A’s success, baseball, in general, predicted outcomes primarily based on System 1 thinking using a strict subset of metrics that were basically mental shortcuts and prior example based. Another pre-moneyball baseball challenge was it’s bias of the Illusion of Validity. The story of baseball player performance was based on an illusive set of first order derivatives that told a coherent, but false, story.
Another of Micheal Lewis’ stories, The Big Short, is also a great example of how false analysis and missed second order derivatives created a windfall for a small few. In short, The Big Short, is a story about a small group of people who saw the 2008 housing crisis long before it happened. Michael Burry was a physician turned hedge fund manager. In 2005, Burry started looking at the subprime market differently from the rest of the industry investors. Through some of his calculations, he noticed that the bonds on the subprime mortgages, although red hot to everyone else, were showing some false analytics. As it turns out, by looking more deeply into the data with second order analysis, he saw things like real estate declining value probabilities and a growing trend of mortgage defaults. Burry gets Goldman Sachs to sell him a CDS (Credit Default Swap), effectively shorting the housing market. In other words, his bet was that if there was a systemic number of mortgage defaults, he would could make a ton of money. And in fact, he did make a ton of money. His bet against the market earned his investors over 700 million with returns of 489 percent. He personally profited over 100 million. Notice the trend here, James was a security guard disrupting baseball and Burry was a physician disrupting financial markets.
In IT, can we find a missing chess board looking for second order opportunities like Rosie, Billy Bean and Michael Burry? In a previous post, I wrote about Abraham Wald and his discovery of the missing holes on WW 2 planes that changed the war ( https://medium.com/@johnwillis/wheres-wald-a515689632a1). Are we looking for the missing holes in IT? What are we not seeing? Can we notice false analytics? To often, I visit a large corporation and I see hundreds of video screens in their NOC (Network Operations Center). When I start asking what does the data mean or, more specifically, what actions do they take based on the data, more often than not, I get blank stares. Why do they have all the screens and data? Is the NOC metaphorically just Moneyball’s hits, home runs, and batting averages? What’s their OBP?
I once had an opportunity to hear Werner Vogels, CTO of Amazon, give a presentation about the history of Amazon’s infrastructure. Near the end of his presentation, someone asked him how Amazon monitors their infrastructure. Vogels, as I summarize, said that they collect data on millions of things; however, he said that only one metric really mattered. It was order-rate. Using statical data, they knew at any point of time what an acceptable range of upper and lower boundaries were for order-rate. It didn’t really matter if CPU was high or if disk space is low because as long as the order-rate is statistically under control, then things were basically ok.
Dave Zweiback is considered an IT industry expert on the topic of blameless postmortems. In his novel Beyond Blame, he tells a story of a network administrator who gets fired because a large network vendor tells him he has to put a critical fix into the network. It turns out the fix causes a massive outage to the network of his financial institution’s network. The network administrator gets fired before the inquiry into the outage even begins. I won’t give away too much of the story, but they find that they have to keep brining him back to understand the true cause of the outage. In this case, the financial organization’s equation is — an outage equals fire someone without further inquiry. The moral of the story is that organizations that are slaves to their historical behaviors without asking second order questions tend to permanently stay in a vicious cycle of negative events and non learning.
There is a famous social experiment called the “5 Monkeys and a Ladder”. Five monkeys are put in a cage with a ladder that has bananas on top. When a monkey tries to climb up the ladder all of the other monkey’s in the cage get sprayed with cold water.
After some period of time, all the monkeys in the cage would attack any monkey that tries to climb up the ladder. At one point, no monkey would dare try and climb up the ladder and the cold water spraying is no longer necessary. Then, the scientists replace one of the monkeys. Of course, the new monkey tries to climb up the ladder, and, after a number of attacks, the new monkey no longer tries to climb the ladder. This cycle is repeated until none of the original moneys are in the cage. Then, they add a new monkey into the cage. All of the other monkeys attack the new monkey when it attempts to climb the ladder, even though none of the monkeys in the cage were ever actually sprayed with cold water. Adrian Cockcroft, an industry leading cloud expert and original architect of Netflix’s cloud infrastructure, calls this organizational scar tissue.
The most damaging phrase in the language is: ‘It’s always been done that way — Grace Hopper
In the Devops movement, you could say some of the second order derivatives are things like Kanban (making work visible), Lean (looking for waste), Theory of Constraints (smoothing the flow), Blamelessness, and Anti-fragility (embracing failure). Ironically, one of the more important and hardest derivatives to uncover is trust. The State of Devops report is a yearly psychometrically based survey that looks at team performance and organizational culture.
The survey uncovers second order derivatives or organizational behaviors using a model called “A Typology of Organisational Cultures”, created by Ron Westrum. Instead of just asking first order questions, the survey is designed to cluster the findings based on data showing a team to be either Pathological, Bureaucratic, or Generative. Finding ways to create collaborative trusting environments is one of the core tenants of the Devops movement..
Seeing beyond the first order questions is a learned skill. I like to use the analogy of the equilibrium of a teeter totter to better understand second order analysis. In the first image (Image 1), we see a balanced teeter totter. We can remove an equal number of items from both sides of this tweeter totter, and we will still have balanced equation. However, in the second image (Image 2), we have some additional data available to use - second order derivatives.
In the second image (Image 2), the items are color coded by their value. Now, we can ask if there is any significance between the blue, green, or brown items. Maybe we find that the brown items have a higher value than the blue and green ones. The brown items may weight the same as the green and blue items, but they cost a lot more. So now, if we remove a brown item from the left side and a blue one from the right side, it may be still equally balanced in our first order equation (weight), but there might be a second order imbalance (i.e., value). By asking for the colors of the items, we find a missing chess board and get a better understanding of our value equation.
There is a great story in the Devops Handbook written by myself, Gene Kim, Jez Humble, and Patrick Debois, of Courtney Kissler while she was at Nordstrom. Every year, their senior staff would discuss the long lead times of an old IBM Mainframe application about how to fix it, and, each year, it would be put off. One year, Courtney decided to add some calculus to the problem by looking at the value stream instead of just assuming the long lead times were because it was a mainframe application. It turned out they uncovered it had nothing to do with the mainframe. There was a manual handoff that was the cause. They fixed the manual handoff with a java script program, and the lead times were no longer an issue. There are a number of case studies in the Devops Handbook where IT leaders choose to ask second order questions to solve their problems. Check it out here…