I love data. The romance started when I was an econ major in college. I loved the World Bank Statistics tables. I loved that I could run a regression myself to draw conclusions from it. I loved that it was math and I was good at it, when I wasn’t good at all math.
Recently, I finished my IBM Data Science Certification for pure recreation. Recreational math. As I finished, I began pondering what data is good for and what it’s not. I thought of examples of when relying on the data too much was wrong or even dangerous, and examples when using less data had led to better outcomes.
So when is data good? How do you use it well? What are it’s limits?
Data is about the past
“It’s important to remember that big data all comes from the same place — the past. A new campaigning style, a single rogue variable or a ‘black swan’ event can throw the most perfectly calibrated model into chaos”- Rory Sutherland,
There are lots of examples of the smartest and most data-savvy people making wrong predictions. Here are a few:
Inflation and stagnation were thought to be inversely related by the economics profession… until ‘stagflation’ hit in the early 1970s
In the 1980s, the first ‘quant’ traders made a killing on the stock market, using data to predict trends in the market…until something changed and huge losses put them out of business.
In 2016, all reputable political statisticians were agreed that Hillary Clinton was very likely to win the presidential election…until Donald Trump
It’s not the models that failed, necessarily it’s that something unprecedented happened that changed everything. Unprecedented changes are, by definition, rare. However, as Nassim Taleb points out in Black Swan, they aren’t as rare as we think.
The signs of changes usually come from data points that we’re not collecting, because they didn’t used to matter. It may be true that intuitions from people with their finger on the pulse of the zeitgeist will be better at predicting Black Swan events than the smartest statistician.
Making the future
“There is no science in creativity. If you don’t give yourself room to fail, you won’t innovate.” -Bob Iger, CEO, Disney
So data science predicts not what will happen, per se, but rather, what will happen if nothing changes.
Other than Black Swans, what can change things?
I recently read ‘The Ride of a Lifetime’ by Bob Iger, where he recounts taking over as CEO from Jeffrey Katzenberg.
Katzenberg had amassed a huge and important organization of advisors called something like ‘Strategic Planning’. These were quants from the best business schools who analyzed each decision for its likely impact.
The thing was, under Katzenberg, the business was floundering.
When Bob Iger took over, he gutted Strategic Planning. He still looked at the data and analysis from the smaller Strat Planning department, but it wasn’t the only factor he used in making his decisions.
Instead, he decided to change things himself. He was the Black Swan that made the data from the past irrelevant.
He did this by emphasizing the quality of shows, rather than audience sentiment on kinds of content, and by making huge deals based on trust with the likes of Marvel, Pixar, Lucasfilms, Fox and others.
Rather than being defined by the data of the past, he made his own future.
On a recent episode of Masters of Scale, Reid Hoffman said something similar. The best VCs will do the due diligence and analysis, but tend to make their decisions based on their gut and some judgment of the character and capabilities of the founders they fund.
Peter Thiel is reluctant to give some predictions of the future, and will often say something to the effect of ‘Well that depends on what we do.’
“Life can only be understood backwards; but it must be lived forwards.”
― Søren Kierkegaard
I think part of my attraction to data and analysis is that it gives us a little more certainty about a potentially scary future. The truth is, while we can make predictions with some reasonable accuracy a lot of the time, we still never know what could happen.
Black Swans can come and change everything. That’s scary.
We have agency to change the future and shape it in our own image. That’s exciting.
Ultimately, whether we know something about the future from our models with 80% or 90% accuracy, we still have to live with the fear and excitement inherent in a changing universe. It’s better to come to terms with this than to cling to false notions of predictability.
What are we looking for, anyway?
And suddenly the memory revealed itself. The taste was that of the little piece of madeleine which on Sunday mornings at Combray (because on those mornings I did not go out before mass), when I went to say good morning to her in her bedroom, my aunt Léonie used to give me, dipping it first in her own cup of tea or tisane. The sight of the little madeleine had recalled nothing to my mind before I tasted it. And all from my cup of tea.”- Proust, Swann’s Way
For my final project, I looked at neighborhoods in New York for their prevalence of coffee and pizza, and found you can predict coffee shop prevalence from pizza prevalence. It was silly, but fun.
But what am I really looking for when I wander into New York, hungry and tired? Any coffee? A slice pizza that a lot of people on Foursquare agree is above average?
I don’t think so. What I really want, like Proust, is to take a sip of coffee and to be swept through all the memories and associations that have shaped my love of the dark, hot, bitter fluid. From my childhood wondering what this strange, dark substance was that only adults could drink, to the sips of the weak and acidic coffee whose taste mingled with the smell of a dozen cigarettes outside my college library during a late night study session. There is a vague memory of the early morning sunlight pouring in through my apartment window, lighting up the swirls of steam from my massive cup of coffee as I laughed with the girlfriend who became my wife.
That’s what I’m really looking for, and it’s not in the data.
However, I do think it’s more likely when the Google Maps rating is higher than a 4.2.
I should build a model…