The Errors of Our Ways
by Lidia Jean Kott
One evening this fall, my Microsoft Word crashed. I didn’t send my error report, but afterwards I kept wondering about what would have happened if I had. Could there be someone, somewhere, blurry-eyed, scrolling through thousands of crashes a day? It seemed unlikely. I contacted a friend who works in software development to find out. He sent me a link to an Internet forum with a thread on the topic. The comments were disappointingly vague, but one of the commentators seemed knowledgeable. I tracked him down and called him. He told me to talk to Kirk Glerum — the “father of Windows Error Reporting.”
When I first reached Glerum, a semi-retired technology consultant, over Skype, he was in the study of his house in Redmond, Washington, and the sound on his computer wasn’t working. We could see each other, but not hear each other. I offered to hang up and try again, but Glerum didn’t notice my message. He was too busy clicking. He looked a bit like a space ship commander, in oversize black headphones and a polo shirt buttoned almost all the way to the top. Two dark cats walked across the screen. Glerum kept his eyes narrowed and tapped his lower lip. Glerum’s love for computers began when he was in high school in the ’70s in Beaverton, Oregon. He saw an electric calculator for the first time and found it fascinating. “You could tell the darn thing what to do, that’s the beauty of programming,” he told me. “I mean, half the time it doesn’t work. And that’s how we got to [Error Reporting].”
Windows Error Reporting was introduced with Windows XP in 2001. It’s “the one innovation that changed software development the most in the past 20 years — for the best,” says James Larus, a computer science professor at a university in Switzerland and a former Microsoft employee himself. The feature opened up one of the first direct lines of communication between users and developers. Before, the only way to reach Microsoft was through waiting on hold with call centers. Now, prompts to send crash reports over the Internet are standard. Apple uses error reporting, and so do Google, Adobe, and many start-ups. Most error reporting, at the initial level, is opt-out, meaning that reports get sent automatically, unless you turn off the feature. We’re constantly sending off reports, often without even knowing what we’re sending. Microsoft alone receives 16 million crash reports in an average four-month window, according to a 2014 study by the cyber security firm Websense.
But it wasn’t always clear that error reporting would catch on. Eric LeVine, the former Group Program Manager for Windows Error Reporting, first heard Glerum make his pitch at a lunch table at Microsoft’s headquarters in Redmond. This was in the late ’90s when Microsoft had a terrible reputation for being glitch-y and unreliable. But LeVine was unconvinced that error reporting could solve the company’s problems. “That’s insane,” LeVine remembers thinking. “We would crash all the servers in the world.” There was not much precedent for crowdsourcing information or sorting through it in any useful way. The term “big data” had not yet even entered common parlance. “In that time, [Windows Error Reporting] was totally novel and audacious,” LeVine says. Glerum already had a reputation for being “kind of way out there.” Once for Halloween, he had shaved his head and glued computer keys to it, which may have also contributed to LeVine’s skepticism.
But error reporting turned out to work better than anyone around that lunch table could have predicted. LeVine says that in its first three years the feature wiped out ninety-five percent of crashes in Office Products. Here’s how it works. After a program crashes, the operating system gathers information such as the name of the application that crashed as well the names of all the other applications that were running at the time. The operating system might also swoop up some personal information, like the name of the document you were working on. Microsoft promises to keep everything confidential. Still, a former employee, who spoke anonymously due to the rules of the company’s non-disclosure agreement, says that if your computer crashes while you’re working on a particularly sensitively titled document, you might choose not to send that report.
The report is then sent to Microsoft’s servers where it’s sorted according to Pareto’s Principle. Vilfredo Pareto was a nineteenth — century Italian economist and sociologist, with a long white beard in his Encyclopedia Britannica picture. His principle, which was based on studying land distribution in Italy, states that about eighty percent of effects come from about twenty percent of causes. At Microsoft, this means that all like crashes — for instance two crashes that occurred in the same location — are grouped together and developers first look at the most common ones, based on the conceit that if they prioritize those, they’ll end up also fixing the majority of problems. LeVine remembers fondly the first problem fixed with Windows Error Reporting: a glitch with an Adobe add-on to Microsoft Word. The add-on was mean to convert files to PDF’s, but it also inadvertently caused Word to always crash right before closing. The solution to that problem marked the beginning of a new era.
Nobody at Microsoft was available to comment about the role that error reporting currently plays in the company. But I spoke to a few former employees about what it was like when they worked there. One of them, who left Microsoft in 2011, says that developers received an email with lists of the top ten crashes on programs they had worked on — an email he calls “the wall of shame.” “You wouldn’t do anything until you fixed your ‘wall of shame’,” he says. An employee who left last year explained that each crash has its own form that developers studied in order to fix it. The form has hundreds of different text fields that include information about where the crash happened, how many times it happened, and what its impact was. There’s also a space for notes from developers about what they’ve already tried or their thoughts about what the issue might be. The former employee puts it like this, “First priority, complete your crashes. At this point, it’s like of course we do it this way. And it used to be radical.”
Error reports actually contain a fair amount of information, and are constantly flying through cyberspace, which made them targets for the NSA. In December 2013, Der Spiegel published a story about how the NSA can intercept Microsoft’s error reports and mine them for clues on how to hack into people’s computers. In an internal presentation, NSA agents replaced the text in Microsoft’s error message asking if you’d like to send a report with their own reading: “This information may be intercepted … to gather detailed information and better exploit your machine.” Alexander Watson, the CEO of the cyber security firm harvest.ai, was in Germany at a technology conference around the time the Der Spiegel story came out. He was surprised by how little even the conference attendees knew about what information error reports contain. “People should know what’s getting sent,” Watson says. “If an attacker were to compromise that they would have a blueprint of how to attack you.”
Yet after the Der Spiegel story came out, Glerum says that he and a couple other members of the original error reporting team all just rolled their eyes. To them, it seemed like a lot of hubbub for no reason. Glerum holds that error reports are too lightweight to contain any information that would be of any practical use to hackers or spies. “I’d love to have lunch with the guys from the NSA who used my stuff,” Glerum says, his tone carrying just the slightest whiff of a challenge.
Brendan Conlon, the CEO of the cyber security firm VAHNA, served as the NSA’s Deputy Chief of Integrated Cyber Operations from 2011 to 2013. Conlon isn’t allowed to talk about how the NSA uses error reports. But according to Der Spiegel, the reports reveal information about the security holes in a targeted person’s computer, which means in theory the agency could generate malware to exploit those weaknesses. Though Conlon won’t draw on his experiences at the agency, he can discuss error reports as a cyber security analyst. “You are sending out information that could be used in a bad way,” he tells me, over the phone. “But you must think about that every time you’re on the Internet. Twitter, Snapchat, your personal information is out there.” Conlon still sends his error reports, though because he says they help improve software. To him, the risk is minimal and worth it. “You’re helping out the rest of the world,” he says.
There is something emotional about error reporting. That’s no accident. Glerum says that when he started at Microsoft, after a program crashed, a dialogue box would appear “that looked like it was admonishing you for something.” It would say, for example, “This program will be terminated,” Glerum recalls, his voice turning deep and robotic. Glerum and his team rewrote the text to make it sound more human. “We wanted people to submit their reports,” he says. “We were desperate to make it sound reasonable.” One of the main changes they made was to add the line “We’re sorry.” But some say the current dialogue box opens up the company to criticism. “The way that Microsoft designed the program, you think ‘Ah Windows, you crashed again.’ Why would you design it that way?” asks a former employee. He points out that on a Windows computer the same apologetic popup appears, even if the program that crashed wasn’t made by Microsoft, making the company look responsible for flaws that aren’t their fault. And most people, after experiencing a crash, don’t feel too forgiving. The comedian Natalie Tran posted an animated video on YouTube in 2010 where she imagines the guy who receives error reports printing them out and eating the paper.
But for Eric LeVine, the former Windows Error Reporting Program Manager, error messages bring up more positive associations. Retired from Microsoft, he lives in Bern, Switzerland where he runs CellarTracker, one of the largest wine websites in the world. LeVine says the website would never have existed if it weren’t for Windows Error Reporting. About six months after he starting working on the project, LeVine took a bicycle wine tour through Tuscany. After LeVine returned, he started a spreadsheet to keep track of his wine tasting notes, and the best times to drink the wines in his cellar. He passed the spreadsheet around the office and his co-workers added their notes too. Eventually, the spreadsheet became a website. Windows Error Reporting served as the model, he says. “The idea really came out of that time — and how to structure things you wouldn’t think you’d be able to structure.”
I can’t say that I always send my own reports now. I often forget, or just feel too impatient. But now I know that a developer might actually read my error reports — when I send them. In the old days, Glerum went through all of the reports himself. At certain points, it took him 20 hours to process one day’s worth of data. “I was a total cowboy, staring at my laptop,” he says. I find it reassuring to think of him, in his oversize headphones and buttoned-up polo, working late at night on a dialogue box that would one day pop up on nearly all of our screens. It turns out that our reports are sorted. And to me, the world seems like a slightly less chaotic, more organized, place because of it.
Gif via PC Pitstop