“The Work Death March” — a survivor’s tale

Nick Ciubotariu
17 min readDec 8, 2014

--

For my best friend, who pulled me through my lowest moment that year by making me read every card on a Christmas wishing tree. For my mentor, who taught me to “trust my gut”, gave me courage and showed me I had a voice.

And for my team, and especially for Nadia, who asked me “When will this finally end?” — I am so very sorry.

*****************

Disclaimer and proviso: “The postings on this site are my own and don’t represent Amazon’s position in any way whatsoever”.

I learned of the term “Karoshi” — literally translated as “death from overwork” sometime in 2006, by reading an article about a man that died at the age of 34 after working 110 hours a week. Towards the end of his life, he was apparently so weak that he could not even pick up his children when he came home. Some time after that, I went through a similar experience — a period of 7 months that, to date, have unquestionably been the worst professional experience of my life.

Inside the company where this happened, this is a pretty well known story. Some tout it as victory against all odds, and shipping a successful product under impossible conditions. I call it something different. A story of incredible misery and despair, marred by years of mistakes, compounded by 7 months of arrogance, disarray, and failure. Failure and critical mistakes at all levels within the chain of command. Failure and critical mistakes that unquestionably fall on me as well. And for some of us, it represented a turning point in our professional careers, where we said “Never Again” and meant it.

Names (with the exception of the one mentioned, who could be anyone) will not be named, nor will the company inside which this happened. The specifics don’t matter. What matters is the cautionary tale it tells, and what was learned from it, and I hope it helps at least one person, team or organization in not going down the same destructive path. Without further ado, this is how it began:

*****************

November — December

The date was November 1st, to be exact. I had just shipped an R & D online service inside an organization that had never shipped an online service before. We shipped to spec, and on time. Spirits were high, and we had great reason to celebrate: During a subsequent Customer Experience Week, the market had responded so well, we had funding for our product for at least another 18 months.

After the ship party, I received an email from our Director of Engineering. He asked to see me right away. He told me that, given the success I had in leading two teams to ship so quickly, he wanted me to take ownership of the most critical area of the business. The Engineering Manager in that team was leaving, and he needed someone “technically strong and driven”, as he put it, to get the team through a highly shortened ship cycle and ship a new version of the product, which would be an integral piece of the company’s completely overhauled flagship software suite.

I hesitated. I didn’t want the job — that part of the organization was widely known to be very disorganized and chaotic, but the “ask” became a “tell”. The Director reminded me of my obligations as a HiPo (there were only two of us in the organization), and told me this is what HiPo’s were asked to do — take the tough assignments, save the company, lead the charge, etc. I asked for time to make my decision, and was given two days and a weekend. I told my boss, who was mortified. I asked my mentor, a high-ranking and well renowned company Executive, what to do, who — very correctly — told me the following:

“The decision has already been made for you. You have one chance to back out, which is right now. It will come at a cost, but everything does, and you may want to pay it now rather than suffer later”.

Ultimately, he proved to be completely right. Feeling like I had no choice (one of the many mistakes I would make), I accepted, and met with my new boss, who began to outline the state of things. I had a team of 9 people. There was technical debt to pay down. We had a shortened ship cycle — from 18 months to 6 months. Our automation needed work. We had zero code coverage, nor a way to measure it. Some of my peers, particularly one other Engineering Manager, was known to ship at will and break everyone else. No one really cared about quality, and part of my job would be to fix that. And I had at least two people to “manage up or manage out”. And while we’re talking about things, he specifically mentioned that I “need to ensure” the success of one particular Engineer, since that person was related to the Director in some family capacity.

Of all the body blows I had just absorbed, the last statement hit like a sledgehammer. I was stunned, literally and figuratively, and I asked my boss to repeat what he just said. And he repeated it, verbatim. I had no response. I had never been asked to play favorites, especially in a case of nepotism, and I quickly requested an urgent meeting with my mentor. He listened, a bit stunned, and told me the following:

  • The company had a perfectly good video on the HR portion of our intranet that tells me how to deal with these things, and that I already knew what to do.
  • Do my best, as I always did, and do not compromise who I was.
  • After my mission was accomplished, leave that group as soon as possible.

I met the team, and got their impression of things. Everything I was told was true, and worse. We had an entire organization of over 100 overseas vendors hammering on close to 4000 (yes, four thousand) manual tests that only tested the positive test cases of our software. There was an antiquated Engineering system used for test automation that no one was using, because no one knew how to use it. Two IC’s had written a new Engineering system, but no one in management had the courage to green-light it into production. We had zero real automation, and the functional tests that did run took a week, on average, to finish, before they would report bugs. Over half the Engineers on our team needed a managed code refresher. Our lab software was so unstable that everyone mostly regarded it as a joke, and did the best they could by testing locally. Planning was an exercise in human combat (more on this later). We had over 1800 bugs filed against us. One of the best engineers on the team, who specialized in Network Programming, told me, simply: “What we shipped with Networking last release doesn’t work and doesn’t make any sense”.

I locked my office for 3 hours, and began fitting together the pieces of a very broken puzzle. Our coding cycle wasn’t due to start until January 7th — we had 2 months to do “something”. I immediately green-lit the test automation project, and told everyone on my team that until we start writing production code, our collective jobs were to get this thing up and running to the best of our ability. We formed a SWAT team, and wrote the core libraries. We worked day and night, through Christmas and New Year’s (I got a call, from one of my Engineers, as I was opening Christmas presents), that I had code reviews that were overdue. And on January 7th, we had the semblance of a real, honest-to-goodness, modern Engineering system and test harness.

In the meantime, product planning raged on. While my team was busy “building a plane already in flight” in terms of an Engineering system, I was in planning meetings from morning until night. We had 150 features to ship in 6 months. After countless meetings, arguments, and noise, we cut them to 32. Even that was too much, but it was the best we could do. I argued that we should just fix what was broken (there was enough for years) and not ship anything net new. No one listened. I met with my boss and insisted — he told me it wasn’t my call. Of the 1800 or so bugs that were open, about 300 were fixed, and the rest were quietly closed and swept under the rug. One of my Engineers quit, and I was faced with the grim prospect of hiring in December. Miraculously, I found someone. And so, on the 2nd week of January, we began to build the product.

January

Mind you, by now, my team was already exhausted. Most of us had worked 80+ hour weeks through Christmas and New Year’s shipping our new Engineering system. We tried to get the other teams to adopt it — two teams helped, the others could have cared less about testing and quality and despite brown bags, pleading and hand-holding, wrote a few test cases here and there but mostly did nothing.

The one person that cared, however, was a newly-minted Vice President of our organization, who asked (quite rightly), why we had over 100 vendors pounding manually on positive test cases, and gave us 6 months to get rid of all but 30 of the vendors, and convert as many of the manual test cases to code as we possibly could. As if we needed more work, the responsibility to onboard the vendors to our new Engineering system fell to me and the one Engineer who was the primary architect of that system. We were given a liaison, and did an analysis of the test cases we needed. Mercifully (as ludicrous as that sounds), we had only 2500 or so to automate, as 1500 were basically useless and running against vapor.

February

By the beginning of February, we had the following streams of execution:

- Finishing our Engineering systems work
- Building quality automated tests to actually define a quality bar
- On-boarding a team of overseas vendors onto an Engineering system we barely knew ourselves
- Code-reviewing and supervising the vendor team that was responsible for transforming the 2500 manual test cases into automated test cases
- Redefining how we wrote code by actually writing unit tests and measuring our code coverage
- Oh, and in case anyone forgot, actually shipping the 32 features for the new product

We were walking zombies. We were now averaging 90 hours of work a week, and weekends were workdays as far as anyone was concerned. Most of us hadn’t slept more than an average of 2 hours a night in months. Half the time, I slept on the couch in my office, and had already lost 15 lbs. Mind you, this is where things got really bad.

We were told we had to rely on the lab hardware to test (normally a good idea, but absolutely not in this case). There was a separate team responsible for stabilizing the lab hardware. There was one Engineer on this team that did 90% of the work, which was not a sustainable model by any means. He warned everyone that the labs were unstable. I calculated the hardware against the number of tests to run per day, per engineer, and told my boss we didn’t have nearly enough capacity to do this. He told me it was not my concern, and “we would deal with that if and when we had to.” The order was given again, explicitly. Then the entire organization literally revolted, and the revolt was quickly squashed by my boss, who swore to upper management on a daily basis that the labs were “stable”. I later found out that having a stable lab environment was one of his top goals that year.

So we did as we were told. Quickly, disaster ensued. Nothing we tested on the lab hardware worked (when we could test, since the lab was down almost all the time). Mainline was constantly broken as nothing integrated. We started running into resource race conditions, as I had predicted, and literally had to take turns testing. Deadlines began to slip. Bugs piled up, and aside from our team, no one cared about testing or quality — everyone was too busy just keeping their head above water.

I found out full-time employees on other teams were asking the overseas vendors to work in the new product codebase, which was technically illegal, since the vendors were not under NDA to do so. It was obvious that everyone was desperate. I brought this up to my boss, and was told he “approved” it. I went to his boss, who told me he had confidence in my boss, and how were other things going? I told him the labs were busted daily, nothing worked, and while our automation story was great, we had no way to measure it since the labs team, responsible for shipping a code coverage product, was drowning. He made a note to check on this, thanked me for my time, and ushered me out.

To make things worse, several teams checked entire libraries of new code in with zero code review, breaking virtually everyone, because they were behind schedule. Ten features were cut, leaving us with 22 features to ship. The entire product worked in bits and pieces, but none of it worked as a whole. Worse, it didn’t integrate with any of the flagship product. The product wasn’t shippable. And here is where we hit the 7th circle of hell.

March

As if we didn’t have enough work to do, our Director signed up for a new R & D project whose first part would be delivered in tandem with our first ship milestone. Half my team was pulled to work on this project, me included. We had so much work to do, by this time, most of us had forgotten we had homes. Bodies were being borrowed from team to team just to cover work. Pets went unfed for 3 days, in one case. I wound up in the hospital at least once from work-related stress. None of us remotely resembled human beings — we were just marching towards May 25, come hell or high water.

At this point, it was a war of attrition. Some things began to stabilize, and we began to land features. My team was doing better than most — we were miraculously on track, with one feature coming in early. We had a ton of test cases, but had no code coverage mechanism or bar to track our quality other than the fact that we kept finding bugs. I asked my boss whether a code coverage tool would be prioritized, since it was clear that the labs would remain unstable. He said no. I asked him if I could stand one up, for my own team, so we could actually measure what the hell we were shipping. He said no. Two days later, he was asked about code coverage in a Director-level meeting by his boss (who apparently remembered the note), and came back livid. He told me, in no uncertain terms, that going forward, everything is to go through him, and he would “manage up” my concerns.

April

I was asked to go to Las Vegas and represent our company at a huge convention, as a “reward” for the great work I had done. On the second night, after (and I say this with 100% accuracy), a great degree of hesitation, I very, very reluctantly joined my new boss and most of our upper management on what turned out to be a party and drinking binge. I started to lose it. As I watched our Director order the 3rd round of drinks for 15 people, and saw him put it on his Corporate American Express card, I asked my boss: “You guys have a budget review tomorrow. How much of this is kosher?”

His response: “Don’t worry, I did the budget for him. We took it all out of morale funds.”

I wanted to vomit. We had been literally killing ourselves, and I had been paying for team lunches out of my own pocket, due to “lack of morale money”, and here I was, witnessing our upper management enjoy Las Vegas with limo rides, drinks and 7 course meals, at our expense. I left immediately. I took a taxi back to the hotel, worked our booth the entire time, and worked on shipping the product at night. I stopped going out with the rest of management, and ate one meal a day, the cheapest sandwich I could find, in the hope that I would somehow add positive balance to the disgusting things that were happening right in front of me.

Upon my return, I told my mentor of what had happened. He asked me if I had reported the favoritism incident, and I told him no. He became furious, and asked me why. I responded that I didn’t want anyone to get fired. He became even more furious, and told me something I will never forget:

“Why would you not report them? Why would you not want them to get fired? Do you think they would be doing this if this was their own company? Your boss might be a Vice-President someday, how can you allow that to happen?”

I have never been so embarrassed and ashamed in my entire life.

May

To a person, we were beyond crushed, demoralized and destroyed. I was in a meeting with my team, discussing the state of things. We would make the ship date, but it would be razor-thin. After going through our runtime agenda, I asked if there were any questions. Nadia, one of my Engineers, had one:

“Nick, when will this finally end?”

As I looked around the room, I saw 9 completely broken human beings. We had been working over 100 hours a week for the past 2 months. Two of my Engineers had tears on their faces. I did my best to keep from completely breaking down myself. With my voice choking, I looked at everyone, and said:

“This ends right now”.

I sent everyone home. I went and told my boss that we would ship what we ship, but the death march was officially over. My statement was met with some kind of blank stare, then arguments. I was told by my boss that he would replace me, that I kept questioning the way he was “running the ship”, which to him wasn’t acceptable. I told him I would escalate to HR, which I did. I provided HR with a statement, as did over half of my team. I spoke of the favoritism I was asked to provide — favoritism which was not needed, as the Engineer in question was an honest, hard-working person who didn’t need to curry favors with anyone to succeed. I spoke about the ludicrous spending in Las Vegas. HR took down my concerns, and sent me an email thanking me profusely, and telling me that I absolutely did the right thing in bringing this to their attention. I was soon speaking to someone in Legal, who took down a sworn statement and ensured no retaliatory measures would come my way. I didn’t care, one way or another. I was dead set on resigning, and actually did, until I was stopped.

Several managers in other groups had seen and heard what was going on, and convinced me to give the company another shot in their respective organizations. I reluctantly agreed. As soon as I left 40% of the team did the same. More followed throughout the organization. HR intervened and pushed every ship deadline back, and an immediate retention effort was put in place. The gates had opened and people were leaving left and right. Everyone had finally had enough.

The results of all this, at the end of the year: we successfully shipped our company’s flagship product on time, with the highest degree of quality it had ever shipped found in my particular organization, as well as the most test cases. Of the 2500 old, manual test cases, 700 or so had been automated, and the Engineering system we had build was a huge success, something that would sustain and grow with the product. And we very nearly killed ourselves doing all this. And we absolutely shouldn’t have.

Miraculously, my Manager Survey results were 85%, with a rating of 100% confidence from my Engineers in their immediate manager. Circling back with them over time, I’ve asked them about this. To a person, they said it was because I bled alongside them and protected them whenever I could. To a person, I have told them all (and tell them again now) is that I hope they can forgive me for being an enabler of their death march, however unwilling, and that I ultimately didn’t do enough to stop it.

As a “reward” for all this, I calibrated #1 overall in my organization, and received yet another HiPo nomination and induction, at the cost of a shattered family life, my health, and a broken team. I don’t think I ever felt worse in my entire career. If I could give it all back, I would, in an instant, no questions asked.

Physically and mentally, I took about a year to heal. Years have passed since this incident, and I feel like I can finally speak freely about what happened. Those 7 months taught me some incredibly valuable lessons. Here they are:

Speak up, and speak loud. If no one listens, leave and leave immediately

Internally, what we accomplished was viewed as an enormous victory for the company. In retrospect, it should have never happened. I escalated and protested, but not high enough. There is always a voice of reason that will listen and intervene. Find it before tragedy strikes.

A year later, faced with a similar situation, I put my proverbial neck on the chopping block when I was the only Manager in a room full of Directors to stand firm against a decision to inject a foreign, largely untested codebase full of bugs into our product 3 weeks before launch. It was a hard battle to fight, but it was the right thing to do. By fighting it, respectfully but firmly, and not ceding ground, I earned the respect of my Engineers and peers, and gave them a voice as well.

Don’t ever compromise your integrity for the sake of your career

While meeting with HR and Legal, I clearly stated that I should have come forward sooner. They both assured me that it was OK, that the majority of people never do, mostly because they are afraid. I was also afraid. I had to be shocked into doing the right thing. I did it 7 months too late. I should have walked out of my boss’s office the minute he told me to play favorites, call my HR business partner and tell them that what just happened doesn’t align with my core values, our company rules, and can they intervene.

Don’t be afraid. Most people at most companies come to work with the very best of intentions to do the right thing. If you find some that do not, report them. You are doing the company, your management, your coworkers, and yourself a favor by doing so. If the company retaliates, this is not a company you want to work for anyway, and you have legal means available to you to protect yourself.

Don’t kill yourself for the sake of work

It’s not worth it. It just isn’t. This isn’t a token comment — people die from overwork every single day. I haven’t gone through another death march since then, and will never do it again. I have had to push back on management many times when they put forth unreasonable requests. I push back with data and I stand my ground. Every one of us has families, hobbies, plans, things we like to do outside of work. Those things are a hell of a lot more important than anything we will ever do at work. We all need balance in our lives. I will never lead or be part of another death march, and neither should you.

There will always be spikes — watch them carefully. When they start to turn into patterns, get to the root of the problem ASAP. Otherwise, you will begin to build technical debt, which will, in the end, eat you alive if you let it get out of control.

Most of all, learn to trust your gut

When you’ve assessed all the data, if something doesn’t add up, go with what you ultimately feel is the right thing to do. Don’t let someone bully, goad or push you into a decision. No one owns you. At the end of the day, even when all else has failed, the most powerful thing we have as humans is choice. You always have a choice — weigh the risks and rewards accordingly, and make sure you are the one making the call.

Because at the end of the day, whether pushed, influenced, or otherwise, you and you alone with live with the consequences of your choices. Choose wisely.

(cross-posted at booleanzen.com and LinkedIn.)

*******

I’m a devoted father and husband to an awesome family, and a Software Development Manager and hands-on technical leader and Engineer in my spare time. For more information about me, please visit my LinkedIn profile, or www.booleanzen.com.

*******

--

--