How To Avoid & Survive Catastrophic Project Failure

Published in

The Digital Project Manager

11 min readSep 25, 2020

Project failure happens a lot. Sometimes projects fail spectacularly, but there’s usually a way to recover, even if it’s painful.

There are many reasons why projects fail, but they pretty much all boil down to one of two things (sometimes both):

There was a risk that no one recognized, and/or
We didn’t respond well when there was a problem

Rather than focus on a list of common reasons for project failure and associated common project risks, which you can find all over the project management world, I want to focus on the bigger picture to help address the main cause of project — and often project management — failure.

While the best path is to recognize the risk and avoid the issue in the first place, you can actually have a catastrophic issue on a project and still pull it off if you act quickly and thoughtfully.

How To Avoid And Survive Catastrophic Project Failure

All of that leads us to two things we can do to avoid and survive project catastrophes.

The first is to recognize the risks and prepare, so let’s start there.

1. How To Avoid Project Failure

Risk management is a key project management activity. It’s so important that failure to manage risk often amounts to project management failure. Risk management has three important components:

Identify risks.
Assess what you’ve identified — how likely is it to happen and how bad will things get if it does? What are the probability and impact?
Prepare mitigation and contingency plans for those risks that could blow up your project. Risk registers are handy for keeping track of what you’ve done.

Identify Risks

First, identify the risks. This can help avoid failed projects altogether if you do it well. And while it’s difficult to catch them all, if you look systematically at the project, you can generally figure out the biggest risks.

My background is in software development, both product and IT, and its associated processes. Software projects are complex, fragile, and except in a few rare cases, don’t have the kind of failsafe and inspections baked in that something like construction or medical device production does.

Your project should have goals for each of the four dials:

Schedule
Content
Cost
Quality

If you don’t know what those are, that’s your starting place. I find that typically two of the four will move around a bit, but the more you understand about where you’re going the easier it will be to identify obstacles. Understanding which of the four dials is the highest priority will help you reduce risk by making appropriate trade-offs in small ways.

It’s easy to get caught up in the schedule dial and miss risks that might affect the other dials, but they’re all related and all-important to identify. Meeting your schedule goals with a product that’s buggy is usually not a good option — even if you send it out you’ll end up doing fix-it releases that push the schedule for an adequate product farther than addressing the issue in the first place would have done.

Beyond the 4 Dials

Besides the four dials, you have three aspects of your project to balance:

product, which is usually covered by the dials
process
people

Process risks come in two flavors: an existing process inadequacy that doesn’t meet the needs of your project and lack of a process for something critical to what your team is doing. Process risks are pretty easy to mitigate by creating or fixing the process.

The people part is often ignored, but your team and the teams they interact with will make or break your project. Difficult or uninterested stakeholders will make it hard to get what you need to move forward. Overworked, stressed, or unhappy team members will not get the job done. Some of the potential people issues you can identify publicly, others you may want to address privately, but don’t make the mistake of ignoring them.

Assess the Risks You’ve Identified

After identifying the risks, determine how likely the risk is to happen. I usually use a percentage of — 100% means I’m positive it’s going to happen. It’s also important to determine how bad it’s going to be if the risk does happen. The easiest way to do this is by categorizing risks as high, medium, or low risk.

Once you’ve done this analysis, it’s time to bring your team in. Ask each team member what they are worried about, add those to the list of risks, and put them up for review:

Can the team think of other risks you haven’t covered?
How do they feel about the probability and impact analysis?

Based on this meeting, you can make adjustments to your risk plan. Parts one and two are done — for now. You’ll need to review the risks at least weekly to see if there are new risks if there are some that aren’t applicable anymore and if any probability or impact has changed.

Prepare Mitigation and Contingency Plans

Finally, if the combination of probability and impact of any risks have the potential to derail your project, work with your team to come up with documented plans for mitigation and contingency. Mitigation involves making the risk either less likely to come to pass or have less of an impact if it does. Contingency focuses on the steps you will take if the risk is unavoidable and occurs.

Be sure the plans are documented and everyone knows about them. It’s also a good idea to plan for potential people issues on top of the 4 dials and process, but you probably want to do the analysis yourself and keep it confidential.

2. How to Survive Project Failure

The second reason for project management failures is a poor response to a risk coming to pass, foreseen or not. This is really all up to you as the project manager, so pay attention to these steps:

Don’t panic. Remember to breathe.
Don’t point fingers and don’t let anyone else start down the blame path.
Contingency plan in place? Call a meeting immediately, or at least send an email triggering the contingency plan. Remind people what the plan is, and tell everyone when and how to report status.

If you skip this or delay it you’ll find that people eager to help have jumped in while in panic mode; when this happens you lose track of what’s been done and it often makes the problem worse.

4. No contingency plan? Immediately rally the troops with a meeting notice and strict instructions to DO NOTHING ELSE until after the meeting. Otherwise, everyone will try to help and you’ll have no idea what’s been done. In the meeting, state the problem clearly — it’s always amazing how many different ideas people have about what the problem is.

Do a consolidated root cause analysis to get to the bottom of the issue and come up with the plan — who, what, and how — to recover. When working on a root cause analysis you need to understand who did what leading up to the issue, but keep the discussion at that level. In other words, just the facts. Don’t assign blame, as finger-pointing is a distraction.

If your team is not able to come up with the solution, you have some options. The first is to bring in someone who knows the general subject matter but isn’t on this specific project — for example, if the problem seems to be with the database, bring in a database administrator not currently on the project to get a different view.

Another technique is to bring in someone with no working knowledge of the project at all. Sometimes having to explain every step explicitly exposes poor assumptions or something people were looking past.

5. Fix the problem. Get everyone on board with your course of action and assign tasks. Have people report progress to you, and tell them how often to send progress reports — hourly, when they’ve finished a task, at the end of the day, or whatever makes sense given the situation. The reporting may seem obvious to you but it won’t be to everyone. Regular updates will help ground the team. Report consolidated progress to the entire team regularly.

Those on the team with no immediate tasks are going to get itchy and eventually try to jump in and help unless you’re feeding them status updates and other information. It’s very important to keep management in the loop as well. Use your judgment to determine who needs updates at what point, but report consolidated progress to the entire team regularly.

6. Finally, call it done when it is. Generate a final report, and move on to the next step.

7. Let the team recover a bit, then hold a post-mortem or retrospective.

State the problem so everyone is on the same page
Start with what went well. In the most dire situation, something still went right. This starts the meeting off with a whole different tone.
Talk about what needs improvement. If there’s time, brainstorm ways to make the improvements; otherwise, assign tasks and due dates and follow up on those dates.

8. If it was actually someone’s fault, address it with that someone — privately. If you do this in public your whole team will be wondering who’s going to get embarrassed in public next. If you skip it entirely, your team will feel like someone got away with something and the problem wasn’t fixed. Be matter of fact and follow up on any personal process changes that someone needs to make. It’s all about accountability and improvement, not finding someone to blame.

What Does Project Failure Look Like In Real Life?

I’ve had my share of IT project failures, although they, fortunately — but painfully — recovered. Here are a couple of my own failed IT projects and development projects as well as one example of many famous failed projects to give you some context. These project failure examples and failed projects case studies should also give you a concrete idea of the ways that projects can fail and how this can be dealt with.

The Accounting System

Early in my career, I worked at IBM. My first real job was to pick up a fairly small system that fed into the corporate accounting systems. Someone within IBM had written it in an obscure programming language called RPG on a small system.

I didn’t just manage the project — I managed the whole system (although there was an admin) including requirements gathering, troubleshooting, and programming. I put a new program on one fine day, ran it, and found an error. I corrected the error and re-ran the program. The next morning I got a call from the Accounting people that there were double entries. (This is very bad news in the world of accounting.)

The Risk Management (And Other Projects) Failures?

First, no one reviewed my work. Second, I didn’t engage folks from the downstream systems to check entries before they flowed through. And third, I didn’t put in the time to analyze some of the old programs that were still running and setting up double entries. There was also no process except what I was making up on the fly.

And then my response — panic. Good lord, right out of college and I’m ruining IBM’s accounting system. I didn’t include the right people — the admin didn’t know what was going on — and he generated yet another set of entries without knowing it. In the end, I worked 48 hours straight to dig down to the root cause, fix it, and write and run programs to fix the accounting errors.

And I learned my lesson — look carefully, get reviews, think about what could go wrong, and stop and think before diving in to try to fix things. Now imagine I was running a 10 person team and everyone did the same thing trying to fix the problem. It truly would have been unrecoverable without bringing down IBM’s accounting systems to dig out.

The New Internal Product

As a consultant, I was running a new, big project with lots of fingers in the requirements pie making for never-ending shifts in functional priorities. We were running an Agile process, which this company hadn’t used before. QA was very good at testing but didn’t yet understand the end-user mindset and workflow.

I knew all these things and did articulate them from time to time, but I had no risk register, no impact/probability assessment, no contingency, or mitigation plans. I was also not the primary information conduit to the project’s sponsor, although I did meet with her from time to time.

We added a sprint. Then another one. Then, as it became clear that we were not quite at the point we needed to be to do a beta test, we added a third sprint.

Because I was not the one providing information to the project sponsor I had no idea that she didn’t know all of this. The person in charge made the critical mistake of not telling the sponsor, hoping we could pull it off. And to be fair, although I talked about risk, I didn’t provide him with a concrete set of risks to present to prepare the sponsor for the possibility of a schedule slip.

In the middle of all this, I had a meeting with the sponsor and made an offhand reference to that final sprint we’d be adding. She stopped me. She knew nothing about it and was not happy. As it turned out, she was more upset about the surprise than the schedule slip. There were public repercussions and eventually the person in charge of the project moved elsewhere in the company because the trust was gone.

Once again, I learned important lessons from the failure. Knowing the risks isn’t enough, you need to put them together, assess them, have mitigation and contingency plans, and make them public and constantly in view. I’ve never been one for hiding risks and issues, but the importance of transparency was made unmistakably clear.

Remember OS2?

Nope, nobody does. It’s one of many failed systems development projects. I was working at IBM when the original OS2 came out — a competitor to Windows. I didn’t even try that first version because at IBM we all knew it was really buggy. The rest of the world found out right away that it had too many failures to bother with.

There was a second version, which I jumped into immediately because it had all kinds of neato stuff built-in that was not in Windows. It was amazing, a dream to use — but it totally failed in the marketplace because the first version was bad enough that no one was willing to try the new version.

Of course, I wasn’t involved at all with the building of OS2, but it’s a prime example of bad risk management. It was buggy enough that the project team couldn’t possibly have thought it was fully baked, but somehow they didn’t make the right tradeoff and hold the release. Failing to consider the human element, both your team and your customers will ruin a beautiful plan in a hurry.

In software there is any number of alternatives to this situation — do a limited release to people who will tolerate the bugs in order to have new bleeding edge software, make public that you’re waiting to make the reliability up to your high standards, etc. This was still early enough in my career that I could take valuable lessons from the fact that a product I loved was at a dead end on the second release.

Be clear about your priorities — there are a few cases where releasing on a certain date is more important than product quality, but not many. Use your 4 dials, and admit your mistakes and make the fixes visible so people trust that you understand the issue and won’t make the same mistake twice. Consider the people element — it’s as critical as the technology. I brought these lessons to software and IT projects from that time forward, and they’ve been invaluable.

The Bottom Line

Overall, remember that even though it’s ultimately your responsibility, you don’t need to do all the work. In fact, you absolutely should not try to do it by yourself. Benefit from the wide range of experiences and knowledge of your team. You’re the leader, but you don’t have to be the entire team.

For more actionable insights on project failure and other relevant topics in project management, subscribe to the DPM newsletter.

Originally published at https://thedigitalprojectmanager.com on July 22, 2020.