For many years, the underlying thesis of information technology has been that there is “too much” information for individuals to deal with. Technology’s job, then, is to surface the best information out of this embarrassment of riches. In the mid-2000s, the solution to this problem was going to be Web 2.0. Nowadays, it’s artificial intelligence.
I’m increasingly convinced, however, that our problem is not information overload but information underload. We suffer not because there is too much good information out there for us to process, but because most information out there consists of low-quality slapdash takes on low-quality research, endlessly pinging around the spin-o-sphere.
Take, for instance, the latest news on Watson. Watson, you might remember, was IBM’s artificial intelligence play, the former Jeopardy winner that was going to go from answering “Who is David McCullough?” to curing cancer.
That was in 2011.
So how has this worked out? Six years later, Watson still has not impacted health care in any major way. In some locations, it’s hit a roadblock with changes in back-end records systems. Most importantly, however, it can’t figure out how to treat cancer because we don’t currently have enough good information on how to treat cancer:
“IBM spun a story about how Watson could improve cancer treatment that was superficially plausible — there are thousands of research papers published every year and no doctor can read them all,” said David Howard, a faculty member in the Department of Health Policy and Management at Emory University, via email. “However, the problem is not that there is too much information, but rather there is too little. Only a handful of published articles are high-quality, randomized trials. In many cases, oncologists have to choose between drugs that have never been directly compared in a randomized trial.”
This is not just the case with cancer, of course. You’ve heard about the reproducibility crisis, right? Most published research findings are false. And they are false for a number of reasons, but primary reasons include that there are no incentives for researchers to check the research, data is not shared, and publications aren’t particularly interested in publishing boring findings. The push to commercialize university research has also corrupted expertise, putting a thumb on the scale for anything universities can license or monetize.
In other words, there’s not enough information out there, and what’s out there is generally worse than it should be.
You can find this pattern in less dramatic areas as well — in fact, almost any place that you’re told big data and analytics will save us. Take Netflix as an example. Endless thinkpieces have been written about the Netflix matching algorithm implications for this or that industry, but for many years that algorithm could only match you with the equivalent of the films in the Walmart bargain bin, because Netflix had a matching algorithm but nothing worth watching.
Are you starting to see the pattern here?
In the Netflix case at least, the story has a happy ending. Since Netflix is a business and needs to survive, they decided not to pour the majority of their money into newer algorithms to better match people with the version of Big Momma’s House they would hate the least. Instead, they poured their money into making and obtaining things people actually wanted to watch, and as a result Netflix is a decent deal for consumers now. But people stick with Netflix today not because the recommendation engine “gets them”, but because Netflix streams a wide variety of enjoyable things to watch.
Let’s belabor the point and talk about Big Data in education. It’s easy to pick on MOOCs, the massive open online courses of 2012’s tulip craze, but pause to remember the value proposition of MOOCs. The dream was that with millions of students doing hundreds of millions of assignments we would finally spot patterns that would allow us to supercharge learning. Recommendation engines would parse these patterns, and…
Well, do what, exactly? Do we have a bunch of superb educational content just waiting in the wings that I don’t know about? Do we even have decent educational research that can conclusively direct people to solutions? If the world of cancer research is compromised, the world of educational research is a control group wasteland.
We see this pattern again and again — companies coming along to tell us that their platform will help us with the firehose of content. But the big problem is not that it’s a firehose, but that it’s a firehose of sewage. It’s all haystack and no needle. And the reason this happens again and again is that what we so derisively call “content” nowadays is expensive to produce, and gets produced by a large number of talented and well-paid people. To scale up that work is to employ a lot of people, but it doesn’t change your return on investment ratio. To make a dollar, you need to spend ninety cents, and that doesn’t change no matter how big you get. And who wants to spend ninety cents to make a dollar in today’s world? That’s so twentieth century.
Processing and promotion platforms, however — whether they be Watson or MOOCs or Facebook — offer the dream of scalability. They offer the promise of zero marginal cost. Of monopoly and lock-in, of permanent industry dominance. That dream drives funding which drives marketing which drives hype. Technologies like Watson and Facebook can be maintained with a relatively small group of people and sail into obscene amounts of profit as they grow, unburdened by the constraints of labor.
And this is why there is endless talk about the latest needle-in-a-haystack finder, when what we are facing is a collapse of the market that funds the creation of needles. Netflix caught on. Let’s hope that the people who are funding cancer research and teaching students get a clue soon as well. More money to the producers of valuable content. Less to platforms, distributors, and needle-finders. Do that, and the future will sort itself out.