Why the Great Glitch of July 8th Should Scare You
Zeynep Tufekci
1.4K46

(This is a very lightly edited version of my Gist spiel)

Thanks for the very smart response, Zeynep! But its not my most favorite reaction to what happened. That came from the blog Zero Hedge, which actually also went down on Wednesday for a while, along with large chunks of the Time Warner Cable broadband network. The headline was “Is This What The First World Cyber War Looks Like”, and the site even provided what they called “a real-time cyber attack map” which would allow us to “keep track of the first global cyberwar in real-time”.

Zero Hedge was not alone: there was real panic in the markets about what might be going on. Josh Brown is the CEO of Ritholtz Wealth Management, the kind of guy that very rich people trust to steward their money in a grown-up manner. But there he was on Twitter, telling his 110,000 followers that the Department of Homeland Security was “lying or wrong” when they said, quite correctly, that the New York Stock Exchange going down was just a standard computer failure, that it wasn’t some kind of malicious attack.

What Zero Hedge and Josh Brown were doing has a name, it’s called illusory correlation. Nassim Taleb wrote a great book about it, his best book, actually. It’s called Fooled By Randomness. Airlines get grounded all the time, for all manner of reasons. Especially United. In fact, something almost identical happened to United last month. And today I got an email mileage statement from United saying I have travelled zero lifetime miles on them. Which I wish was true. And stock exchanges are complex things, which fail sometimes, with negligible consequences. There are a dozen different stock exchanges in the US, and not a single stock stopped trading for even a second. The trading just moved to, you know, the eleven different alternative exchanges. Or the 50 alternative exchanges, if you’re including dark pools. As for some of the other things, well, as a long-suffering customer of theirs, I can tell you that it would frankly be more newsworthy if large chunks of the Time Warner Cable broadband network weren’t down on any given day.

And sometimes, three or four of these kind of things happen in a single day. You have enough days, you’re going to see a bunch of these simultaneous failures. In fact, we’re going to see more and more of them going forwards.

Zeynep’s really great little essay is entirely open about the fact that there was no malicious intent here. Instead, she writes, “The big problem we face isn’t coordinated cyber-terrorism, it’s that software sucks”

Here’s the problem. Everything runs on software. Everything. It’s ubiquitous. Let me quote Paul Ford, from his amazing “What is Code” issue of BusinessWeek:

So many things are computers, or will be. That includes watches, cameras, air conditioners, cash registers, toilets, toys, airplanes, and movie projectors. Samsung makes computers that look like TVs, and Tesla makes computers with wheels and engines. Some things that aren’t yet computers — dental floss, flashlights — will fall eventually.

When Paul writes “fall”, he could easily have written “fail”. Because that’s what computers do: they fail. Even when they’re built to be redundant, they fail. I just lost fifteen years’ worth of digital photographs because I had my photo library on a redundant RAID drive, but then the disk which failed contaminated the other disk, and — sorry. I’ll try not to get distracted here.

The point is that computers run on code, and code is, to use another technical term, a mess. It’s put together in a slapdash way, and then when it’s fixed, or asked to do something new, it becomes even more precarious. Everyone’s in a rush, and they do something which is good enough, probably, and they fully intend to come back to it and make it better at some point in the future when they have a bit more time, but of course that point in the future never, ever happens. And then companies merge and computer systems get built on top of other computer systems and everything just becomes more and more complex, every. Single. Day.

These complex systems are startlingly easy to hack, which is one of the reasons why it’s easy to believe that when they fail, they have been hacked. But the people who jump to the conclusion that an organization has been hacked, those people are forgetting Occam’s Razor, which says that you should always select the simplest explanation. And the simplest explanation is, always, “the computer crashed”. I know it wasn’t meant to, but it did. It might have been a software error, or a hardware error, or even some kind of weird solar flare activity, but whatever it was, boom.

Because every single system becomes more complex over time, and the more complex your system is, the more likely it is to fail. In fact it’s worse than that. The more complex a system is, the more catastrophic any given failure is likely to be. Even if it’s just completely random, and not malicious at all.

Zeynep says that there’s a “lack of interest in fixing” this problem, and I think I actually disagree with her on this one. She thinks that the problem is one which could be fixed, if only we tackled it with enough money and enough time. But I look at what would be involved in fixing, oh, I dunno, a single mid-sized bank, or even just tweaking the text editor that Reuters journalists use so that they can put hyperlinks in their stories. Which, trust me on this one, is way, way harder than you think. And then I multiply that not only by every company, but by every product: every watch, and every camera, and every air conditioner, and I know that it could never be done. We would never even make a start. New problems will always be introduced more quickly than old problems are solved.

Right now, for instance, there’s a lot of talk about the scandal of weak government computer security, following the hack into the data at the Office of Personnel Management. But I don’t think the OPM’s security was weak. I just think they had computers. And just like computers will always fail, computers will always get hacked.

I went to a talk once by Max Levchin, the founder of PayPal, who is one of the most intelligent humans on this planet. Seriously. He’s incredible. He doesn’t run PayPal any more, and he was joking about the way that PayPal still runs the same old code that he wrote back in the day, which is completely out of date today. And of course he has a shiny new company, called Affirm, and they would never run their website using that kind of old code.

But I wondered: could even Max Levchin manage to overhaul the code for a company the size of PayPal? One of the reasons that great technologists like Max like to start new companies is precisely that that’s the only time you get to start with a blank piece of paper and build everything the way it should be built, from scratch. Max loves to fund startups, and build startups, but the fact is that we don’t live in a world of startups. We live in a world of big, old, mature companies, which are often the result of dozens of mergers and acquisitions, and which are built on various bits of code going back to god knows when. And all of those systems are going to fail. And get hacked. And none of those systems can be fixed so that they’re not going to fail. And they certainly can’t be fixed so they won’t get hacked. And no, the Cloud is not going to save you, and in fact it might even make matters worse, because when the Cloud fails, it’s not just one company brought to its knees, it’s thousands of them. And yes, even the Cloud is built on layer upon layer upon layer of non-perfect code.

This is what progress looks like. If it wasn’t happening, that would be worse! But the fact is that the world is exponentially more complex than it was even a decade ago, and it’s getting more and more complex every day, and all that complexity is certain to lead to catastrophic failure, every so often. Wednesday was nothing. There’s absolutely nothing we can do about it, and in fact there’s not even a lot of point in worrying about it. It’s going to happen, whether we worry about it or not. Zeynep tells us that we should “Be scared,” and “Be very worried.” But I go back to the Serenity Prayer, which tells us that we should have “the serenity to accept the things we cannot change, the courage to change the things we can, and the wisdom to know the difference.”

This, my friends, is one of the things we cannot change. Or, at least, that we should not change. Because the only way to prevent random catastrophic failures would be to reverse the march of progress altogether. And that would be so much worse.

A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.