Twenty Years after the hype and fear surrounding the Y2K bug, what actually happened and what have we learned?
Just after 7:00 am on Saturday, January 1st, 2000 my work phone started ringing beside my bed. This was unexpected. Most of my clients would not have started work yet after the Christmas holidays, and the ones who had paid for 24/7 cover would not call me directly.
The screen on my phone showed that it was the general manager of a local manufacturing company. I’d written the software to control their production line some years previously, and now they paid us to maintain and occasionally update it for them.
“Matt, the line is down and it won’t come back up!”, was the first thing he said. This was not an uncommon occurrence. The factory was situated far from the control office, and we’d already had to lightning-proof the long cable, which ran from the computer to the huge manufacturing machines some 200 meters away.
“Have you checked the comms?”, I asked, expecting the cable to be at fault.
“Yep, everything passes.”, he replied. We also kept a complete, redundant computer on-site that could be swapped in if something went wrong with the primary.
“Have you switched to the backup machine?”, was my next question.
“Yes. Same problem.”, he answered, “The line is just dead. Everything looks ok, but no instructions are being sent.”
“I’ll be there in 30 minutes.”, I said and hung up the phone. As I drove out to the site, I tried to create a mental model of what could be happening. The Y2K bug was nagging me, but we’d updated all of the computers and the production line machines were huge and hulking, but ultimately they just obeyed whatever came down the cable. We’d already checked with the manufacturers that their chips contained nothing even remotely date-dependent. It was a puzzle. And an annoyance, because we didn’t yet have sufficient Internet bandwidth to be able to support these clients remotely!
When I arrived on-site, the manager greeted me enthusiastically and showed me to the control room. This was a grandiose name for what was, effectively, a Portacabin sat to one side of the sprawling manufacturing and distribution company. The “control room” contained an old desk on which sat two computers, and two lights mounted in a box on the wall: green to show that the production line was operating and red, which was now illuminated, for when the line was stopped.
I still didn’t have a clear idea of what the problem could be. My confusion was compounded when, just as the client had said, the computers, cabling and production machines all seemed to be working. Using a terminal emulator, I sent commands down the long cable to the machines in an attempt to, at least, get them running today’s production program. The green light illuminated on the wall behind the computer, and a muted cheer from the distant building indicated that the machines were still listening and working. This meant that my program had to be at fault.
In the twelve months leading up to December 31st, 1999, my company had checked hundreds, if not thousands, of computers and software systems for Y2K compliance. We updated BIOSes, patched code and ran detailed simulations on our clients’ mission-critical equipment to be ready. The so-called Y2K bug arose from programmers using two-digit dates to represent the year. This was fine when we were dealing with dates in the 20th century, and many of us did not expect our code to still be in use in a production environment when 2000 rolled around. Many older BIOSes and programs that depended on two-digit years represented the year 2000 as 1900 — or, in some cases, 19100. I had personally scoured several hundred thousand lines of my source code to ensure this bug was fixed.
But here I was. Sitting in front of my own program, and wondering why it wasn’t working. In those days, before high-capacity USB sticks or decent Internet connections, I used to leave a copy of the source code and compiler on-site, so I decided to check it again. The first thing I noticed was that the modification date was September 1999 — two months after we had performed Y2K compliance checking, and eleven months since my previous update!
My only hands-on experience with a system failing because of the Y2K bug — and it was in a program that I had written!
When I looked through the code, there — buried deep in the function that sent the commands to the production machines — was a two-digit date. The code (in C) looked something like this:
if (current_year >= 98)
// Call function to send program
current_year got the system date and returned the final two digits. My memory isn’t perfect, but I knew that I had not inserted this code. I asked the manager if he knew anything about it. He shuffled uncomfortably from foot to foot and then said that some workers had been sending old programs to the machines, which caused problems. Each program started with a version number, including the year. They wanted to prevent workers from using programs from before 1998.
“But, I didn’t write this code.”, I persisted.
“Ah…yes…”, he replied, reddening, “We had a work experience guy (intern) in and he said that he could fix it without having to pay you guys…”
Now it all made sense. Fixing, recompiling and testing the code took a short time, and our invoice was paid without a murmur.
There it was: my only hands-on experience with a system failing because of the Y2K bug, and it was in a program that I had written!
But I thought the Y2K bug was all hype…
In the weeks and months following January 1, 2000, when planes failed to fall out of the sky and civilisation did not revert to a pre-historic society, concern turned to contempt. Certain media outlets accused the IT industry of scaremongering and creating mass hysteria for a problem that ultimately failed to materialise.
That belief ignores two key points:
- There were problems due to the Y2K bug! On January 1, 2000, 154 pregnant women who were patients at the Northern General Hospital in Sheffield were sent incorrect Down’s Syndrome test results. They were erroneously told that they were in the high-risk group due to the computers miscalculating their age. Two women terminated their pregnancies based on those results. In Japan, radiation monitoring equipment failed at midnight, and similar smaller-scale problems were reported in the United States and France.
- The reason why the global effect on January 1, 2000, was (relatively) minor was due to an unprecedented international collaboration costing between $200bn and $858bn. That the Y2K effect was so muted on January 1st was due to an enormous and well-coordinated effort behind the scenes. As The Guardian put it in their special report:
In other words, companies and governments didn’t spend money fixing a problem that didn’t exist. The fact that the sky didn’t fall on you doesn’t mean there was no Y2K problem, but that serious problems have been averted. So far.
Was there scaremongering and profiteering? Yes, absolutely. I knew of one company charging the equivalent of $250 to run a free check and put a sticker on the front of the computer. Unfortunately, there will always be people preying on the fear of others. The problem, however, was real. My own small experience proves that, if the bug hadn’t been handled as well it was, things could have been much, much worse.
What Have We Learned?
There are many lessons that we can draw from the events surrounding the Y2K bug, but I just want to focus on two of them.
Firstly, we’re better when we work together. The international, coordinated effort that brought together the entire industry as well as governments and external stakeholders proves that we can solve major issues when we put our minds to it. This was the first real test of a relatively young industry, and the fact that people joked afterward about the “Y2K non-event” shows that we passed. It makes me wonder how much progress could be made against the other problems facing our planet if we had the same level of cooperation.
Secondly, developers need to make fewer assumptions. Whether you’re a programming student or a seasoned developer, we all make assumptions. We make assumptions about the underlying architecture, assumptions about who will use the software, what they want from it and about how long our programs will be in use.
I remember one client who ran their entire production line using a 1980’s 8-bit home computer. We gave them a quote for re-writing the program, but eventually, they decided to buy a new PC and run the program in an emulator. They could still be running their systems using a program that’s nearly 40 years old for all I know. So, as developers, it’s good to make fewer assumptions — especially about the longevity of our software.
Of course, the Y2K bug was not the last time developers would need to worry about the correct representation of dates. The next one is the Y2K38 problem, which is not a bug, but rather a hard limitation on how Unix encodes dates and time. I’ll be retired by the time January 19, 2038 rolls around, so I’ll leave it to the next generation of developers to make sure I can still get my pension!