Effective Crisis Management

Clement Kao
Product Teacher
Published in
18 min readJan 29, 2023

Essay originally published on the Product Teacher blog on October 3, 2022.

Product managers are responsible for both the success and the failure of their products. And, since products are built by imperfect human beings, they will inevitably fail at some point. Therefore, every PM will run into at least one product crisis over the course of their careers.

When crisis strikes, the most effective product managers stay calm under pressure, and they rally teams together while providing clarity for impacted customers and stakeholders.

In this guide to effective crisis management for product management, we first identify the kinds of crises that product managers should expect to handle. We then discuss the traits that product managers should exhibit during a crisis. Afterwards, we discuss what mindset to use as you work through a crisis, and we share a playbook for moving step-by-step to resolution.

Let’s get started — what kinds of crisis situations are product managers expected to handle?

The three kinds of crises that PMs are responsible for

Before we can talk about crisis, we first need to define what a crisis is. A crisis is a negative high-impact situation that has a clearly defined start point and a clearly defined end point.

Using this definition, product managers generally need to handle the following three categories of crises:

  • Product outages
  • Unexpected product behavior (e.g. critical bugs)
  • Customer escalations or partner escalations

A product outage is what happens when the product no longer works at all. Users cannot successfully start or finish mission-critical workflows in the product.

If your product doesn’t load in a browser or in a mobile app, then you’re experiencing a product outage. If your users can’t log in, then you’re experiencing a product outage. If your product hangs in the middle of a workflow, then you’re experiencing a product outage.

The defined end point of a product outage is when the product can be properly used from start to finish once again.

A similar but distinct crisis category is unexpected product behavior, colloquially known as “bugs.” While users can start and finish workflows in the product, the product’s behavior is highly detrimental to the user.

If your product is skipping certain steps in a workflow, then you have unexpected product behavior. If your product is accidentally deleting saved data, then you have unexpected product behavior.

While nearly all products have dozens of bugs, the vast majority of bugs are not product crises. Different bugs have different levels of negative impact.

We consider a bug to be a product crisis when it seriously damages the trust and credibility of your company for a significant portion of the user base. While this sliding scale requires judgment, most of the time you’ll know it’s a crisis based on the volume, urgency, and negativity of inbound customer feedback.

The defined end point of unexpected product behavior is when the product’s behavior has been restored to its expected behavior.

Finally, if customers or partners escalate to our management teams, then we have a relationship crisis on our hands. They’re deeply unhappy with a given situation, and we need to resolve this issue ASAP or they will terminate their relationship with us.

Most of the time, the escalation is focused on a particular feature missing key use cases, or a feature being delivered later than expected, or the roadmap not aligning with their desired timelines.

The defined end point of escalation is when the escalating party has formally signed off on our resolution to their escalation.

As a counterpoint, let’s look at some negative situations that shouldn’t be defined as crises, due to their lack of clearly-defined starting points or ending points:

  • Attrition of engineering talent, design talent, or product talent
  • Slow degradation of profit margins
  • Prolonged shrinkage of user base
  • Toxic work cultures

In these long-burning situations, you’ll be expected to help as a crucial partner and stakeholder, but you won’t be using a crisis management playbook. Therefore, we’ll keep these situations out of scope for this specific discussion.

Now that we know that product managers are responsible for both product crises and relationship crises, let’s talk about what kinds of traits are valuable for product managers to navigate crisis situations.

PM traits for successfully managing crises

Product managers with these three key traits can successfully navigate any crisis:

  • Calm
  • Focused
  • Transparent

First, being calm helps to settle everyone’s nerves, and it reduces the likelihood that incorrect information is shared or bad decisions are made. Remember that anxiety and stress are contagious — the more anxiety you demonstrate, the more anxiety you inflict on cross-functional stakeholders, customers, and end users.

To be clear, being calm doesn’t mean being slow. We need to work through crisis with a sense of urgency. But, if we’re too anxious during a crisis, we might rashly choose actions that wind up making the problem worse.

Second, being focused helps speed up crisis management. If you attempt to multitask during a crisis, you become the bottleneck to many different critical threads.

While you are ultimately responsible for resolving the crisis, you are not the only person who can resolve it. Crises require cross-functional teams to resolve, with each subject matter expert sharing their domain knowledge.

Therefore, as a product manager, you should only tackle one thread at a time. Tell people what you’re actively tackling, what you’re not doing, and what you’ve delegated away. By doing so, you streamline communications and set better expectations.

Third, being transparent keeps everyone on the same page. Share your thought process with as many people as you can, encourage everyone to share their thought processes with you, and collaborate towards a solution rather than trying to be right.

Transparency is easier to manage when you can centralize communications. Opening up a dedicated messaging channel (e.g. a Slack channel) and/or a dedicated video call (e.g. a Zoom call) makes it much easier for information to flow.

While no one likes being wrong, remember that it’s better to know that you’re wrong upfront and course-correct than to rashly take action and find out that you’re wrong later. The earlier you surface your thought process, the faster others can catch any possible mistakes and prevent things from getting worse.

Thankfully, all traits can be strengthened with thoughtful practice. While the current strength of your traits might be innate, you can grow and deepen these traits over time.

Here’s my personal example: I am not a calm person by nature. I’m naturally an anxious person who is prone to stressing out. But, I’ve practiced becoming more calm over time, and I did so by soliciting feedback and implementing suggested changes from my counterparts.

In the dozens of crises that I navigated in product organizations, I have consistently received praise for staying calm, collected, and organized. While this trait wasn’t something I started my career with, it was a trait I was able to strengthen over time.

But, why only discuss these three traits? Aren’t other traits valuable for product managers in crisis situations? Of course they are! But, other traits are less valuable than being calm, focused, and transparent.

As an example, if you’re highly data-driven and you pull lots of different data queries during a crisis, that might be helpful — but, not if it comes at the detriment of your focus.

Or, if you’re highly empathetic, that can help with getting people’s trust — but, not if it causes you to lose your cool and spread panic across the team.

By knowing which three key traits we need to demonstrate during a crisis, we can identify growth opportunities that we can take proactively before a crisis happens.

If we know that we’re excitable or anxious, then we can practice being calm in low-risk situations, so that we’re prepared when a crisis eventually comes.

If we know that we’re not good at being focused, then we can practice that trait before a crisis happens.

If we know that we’re not naturally transparent, then we can practice that skill with others beforehand.

Let’s now discuss the key mindset that we should use to approach any crisis.

The best mindset for effective crisis management

If you take away only a single learning from this guide, it should be this one: don’t assign blame as you work through the crisis.

The blameless mindset is the single most valuable tool at your disposal. Why is that?

Blame is an unproductive allocation of time and energy that produces negative outcomes. While it’s human to blame people in the heat of the moment, it almost never drives positive results.

If you or your team blame someone, it will make that person less productive in addressing the crisis. More often than not, the person or team being blamed is key to resolving the issue. Blame breeds resentment and slows down our organization’s ability to arrive at a solution.

Even worse, if people are actively seeking to avoid being blamed, they may not share critical context about the situation because they may be afraid that you’ll blame them. Therefore, we should never blame people for their actions.

But, to be clear, a blameless mindset does not mean that we absolve others of their responsibilities. The party who caused the issue is responsible and accountable for helping fix the issue, even though we won’t blame them for causing the situation.

On the other hand, sometimes people might not be ready to focus on solutioning until they’ve first found a scapegoat. If this is the case, you need to take the blame as the product manager, even if your actions didn’t cause the issue.

Remember, as the product manager, you are the representative of the product in both good situations and bad situations. Sometimes, proactively claiming the blame helps to calm people down where they’re no longer worried about being blamed, and that gives them the mental space to think clearly and come up with solutions.

You gain respect when you take responsibility as a leader, and you set a positive example, too. When you accept responsibility for the situation, many times you’ll find other people apologizing wholeheartedly for their part in the crisis, and the team gels together instead of pointing fingers.

Now that we’ve established the blameless mindset to crisis management, let’s dive into the five-step process in managing product-related crises.

The five-step playbook for navigating crisis situations

The key to solving problems is to structure the problem thoughtfully. So, we’ll want to make sure that we’re working through these five steps, whether we’re managing a product crisis or a relationship crisis:

  1. Identify the problem
  2. Mitigate the impact
  3. Diagnose the root cause
  4. Resolve the crisis
  5. Prevent future crises

Identifying the problem

Before we attempt to solve the problem, we first need to identify what the problem is. Many times, you might be tempted to take immediate action — but, these actions can wind up making the problem worse, especially if you’ve defined the wrong problem.

Ask these questions to identify product problems:

  • Which features are not working correctly?
  • When did these features stop working correctly?
  • Which customers were affected, and how many customers were affected?
  • Is there anything in common between the customers who are raising the issue?
  • How frequently does the issue come up?
  • How severe is the problem?

The best way to isolate the problem is to reproduce it yourself in a test environment and share reproduction steps to your engineering team. That way, they can quickly rule out unrelated issues.

By identifying key attributes such as which operating systems are affected, which browsers are affected, or which kinds of users are affected, your team can confidently define the problem to tackle.

On the other hand, if you have a relationship crisis, such as an issue with a customer or an issue with a partner, then you need to use a different set of questions instead:

  • What were they expecting from us?
  • Why did they have those expectations?
  • How does the current situation violate their expectations?
  • What kinds of end results would be acceptable resolutions for their expectations?

Mitigating the issue

Now that we’ve defined the problem, we need to decide whether we can lessen the negative impact immediately through mitigations. The goal of any mitigation is to lessen the severity and the scale of the crisis. Crises are typically complex problems with many moving parts, so the most valuable thing we can do in the short term is to “stem the bleeding.”

We want to see whether it’s possible to mitigate the issue, because mitigations are generally cheaper and faster to ship than full-blown solutions. Keep in mind that not all crises can be mitigated, but the crises that we can mitigate will cause less damage over the long run.

When we’re tackling mitigations, we should prioritize speed and reversibility. Ideally, mitigations happen within the same business day that we’ve identified the problem, and mitigations should take no longer than 1 week to complete.

For product crises, you’ll be partnering with the engineering team and the design team to find viable mitigations. As you do so, consider the tradeoffs of each mitigation. How much effort will the mitigation take, and what will we need to sacrifice in favor of mitigation? Furthermore, how do we ensure that the mitigation doesn’t wind up causing longer-term problems later on?

Shipping a quick fix to neutralize the worst problems enables us to recover value as quickly as possible, and it buys us more time to come up with a more thorough solution. At this stage, the goal is not to come up with a perfect solution that will scalably solve all future problems; rather, we need to help people out of painful situations ASAP.

And, we need to make sure that our mitigations can be easily reversed. Many short-term mitigation will cause longer-term problems down the road, so we need to ensure that we can “remove the bandaid” when we eventually circle back to solving the problem more deeply.

Furthermore, we need to remember that mitigation is not just about mitigating the product’s direct negative impact on users. We also need to mitigate emotional pain as well. The more transparency and visibility we can provide to customers and stakeholders, the less uncertainty and fear they will experience.

Using this lens of “emotion management,” mitigations don’t have to be new code or new features. Many times, the most effective mitigation is to provide a clear set of instructions and workarounds while you work on resolving the issue, and to provide timely status updates to give affected parties a sense of progress and security.

Similarly, if we’re tackling a relationship-based crisis with a key customer or a key partner, we can look for short-term mitigations without immediately committing to new features or new deadlines.

As an example, we can agree to run user interviews, usability tests, and user shadowing to better understand the pain they’ve raised to us. As another example, if they find our products difficult to use, we can agree to provide interim in-person training for our products while we take time to search for a better solution.

In fact, sometimes the best mitigation is to sit down with the escalating party and to transparently talk through your prioritization process. By giving them insight into how you decided to build the product, they will gain valuable context that they didn’t have before.

As a firsthand example, one of my customers had escalated to our management team with a threat to terminate their contract with us, due to behavior that they did not expect within a specific user flow.

To mitigate the issue, I sat down with the customer’s leadership team to first understand what they expected and why. I empathized with the pain they experienced, and I explained that many other customers had initially run into similar challenges.

Then, I discussed why our product had been built in a specific way, and why their request had been previously considered but we had actively made the decision not to build it that way.

I explained that we had identified downstream issues within customer processes that wound up causing more pain if we decided to build it the way they had desired. I also shared our plans over the next 2–3 years to address their feedback without breaking their internal systems.

When they realized that we were keeping their interests front and center, and that we had considered their needs, they expressed relief and de-escalated the situation. I committed to sharing our rationale more visibly within our onboarding documentation and processes so that their future users wouldn’t run into the same issue again.

To summarize, we’ve now gained insight into key levers for mitigating both product crises and relationship crises:

  • Quick engineering fixes
  • Process changes
  • User training
  • Documented workarounds
  • Transparent communications

Through mitigation, we’ve suppressed the most serious negative impacts. We now have time to actually diagnose the root cause, so that we can fully resolve the issue at its core.

Diagnosing the root cause

To fully resolve the crisis, we first need to diagnose its root cause. While we had previously identified the problem, we only looked at the issue from a symptoms perspective, rather than from a cause perspective.

Many times, the root cause of the issue is systemic.

For product issues (i.e. product outages and unexpected product behavior), the root cause typically happens due to non-obvious conflicting engineering processes or conflicting logic within the product. We can usually rule out basic mistakes since we test our products before releasing them, and we would have caught easy-to-spot bugs in that step.

To help your engineering team diagnose the root cause, provide them with the context that went into your initial problem identification.

Because diagnosis typically requires 1–2 days of dedicated deep dives from engineers, we need to protect them so that they have time to dig into the issue. Remember to place their current workload into the backlog so that they are no longer immediately responsible for building new enhancements and features — after all, resolving the crisis comes first.

When diagnosing the root cause of a relationship issue, you won’t be leaning on your engineering team. Instead, you’ll be driving the diagnosis yourself.

For relationship issues, the root cause typically happens due to misaligned incentives between teams, or information silos that caused key information to get lost. Speak to the key stakeholders that were involved in the escalation, including the relevant salespeople, marketers, account managers, and business development managers.

Once you find the root cause of a relationship issue, make sure to keep an eye on it as you tackle the actual resolution of the crisis. You don’t want it to show up again, or else it’ll sabotage your proposed resolution.

Resolving the crisis

We now have a clear diagnosis of the root cause. For product issues, work alongside your engineering team to identify how long it will take to fully resolve the issue, and discuss any tradeoffs or risks that might come up by taking this kind of approach.

In crisis resolution, you are no longer optimizing for speed. You’re now optimizing for completeness. Make sure that you and your team take the time that you need to truly resolve the issue, rather than adding even more short-term bandaids.

A full resolution of the crisis will require a tradeoff in terms of return on investment (ROI). As product managers, it’s our responsibility to identify where the cut line is. How much pain does each proposed resolution solve, and how much will each proposed resolution cost?

Typically, a reasonable resolution will cost anything from 1 person-week to 4 person-weeks of effort. Anything larger than that is probably too broad in scope for resolving the issue and should not be immediately committed to.

Once you’ve decided on a resolution with a defensible return on investment, share your decision with key stakeholders. Keep them in the loop as you and your team implement the resolution.

For relationship issues, identify what kinds of resolutions are acceptable for the escalating party and are reasonable for your organization to commit to.

The following three kinds of relationship crisis resolutions tend to be acceptable to the escalating party, though the effectiveness of each depends on the nature of the crisis itself:

  • Payment, reimbursement, or credits
  • Committed external features and timelines
  • Committed internal processes and staffing

Preventing future crises

We’ve successfully resolved the issue, and we’re operating under normal circumstances again. While you might be tempted to put the whole situation behind you, you shouldn’t do so. The highest-leverage next step is to prevent future issues like these from happening, whether the crisis was a product crisis or a relationship crisis.

In other words, it’s time to conduct a root cause analysis (RCA).

Performing an RCA helps us surface the different factors that caused and exacerbated the crisis. After all, problems rarely evolve into crises on their own — many times, crises happen under a “perfect storm of coincidences,” with multiple points of failure playing into the situation. By surfacing all of the different factors and identifying the highest leverage factors to address, we can make our products, our processes, and our organization more resilient over time.

To run an RCA, use the 5 Whys method to conduct a group retrospective. While you’re not literally asking “why” five times, iteratively asking “why” peels off each layer of complexity.

Pull together all of the affected stakeholders into a room, and walk through a detailed timeline that identifies the following milestones:

  • When the problem started, and how it started
  • When the problem was first acknowledged, by which team
  • Each handoff between teams
  • When a mitigation was first proposed
  • When a mitigation was implemented
  • When a resolution was first proposed
  • When a resolution was implemented

Then, circle back to the beginning. State the initial problem that happened, and ask the full team to contribute questions, answers, and possible resolutions.

Let’s walk through an example of the 5 Whys in action. In this hypothetical scenario, let’s imagine that end users were being unexpectedly logged out every 5 minutes after a recent product release went into production.

Why #1: Why were end users being unexpectedly logged out every 5 minutes?

  • Answer #1: We had recently changed our authentication logic to increase security, but our testers didn’t stay logged in long enough during testing.
  • Resolution #1: Add automated test cases to check that users are staying logged in long enough.

Why #2: Why were we unable to identify the issue ourselves before customers reached out?

  • Answer #2: We don’t have any dashboards that show “average session time.” If we had those dashboards, we could have seen that session times had dropped significantly.
  • Resolution #2: Add tracking, dashboards, and alerts to monitor “average session time.”

Why #3: Why weren’t individual users able to let our support teams know, and why did we have to wait until customer executives raised the issue to us?

  • Answer #3: While we have a “report a bug” flow for logged-in users, we don’t have any UX available for end users to report bugs to us when they’re logged out.
  • Resolution #3: Add the “report a bug” flow for logged-out users.

Why #4: Why wasn’t the issue raised to the authentication PM directly, and why was it routed through a different PM instead?

  • Answer #4: The customer support team did not know that the authentication team had recently released functionality.
  • Resolution #4: The authentication team will now share their upcoming releases in the “feature release notes” process, on top of sharing them in the “engineering platform update notes” process.

Why #5: Why did it take so long for the PM who received the issue to route it to the authentication PM?

  • Answer #5: The PM who received the issue is new to the product team, and did not know that there was an authentication PM.
  • Resolution #5: Create a directory of features and PMs so that all PMs know who to contact for which features.

Of course, while I’ve conveniently ended this scenario by asking “why” 5 times, you should iteratively ask “why” until you’ve found enough root causes and processes to fix.

Sometimes you only need to ask “why” 3 times, and other times you’ll need to ask “why” more than a dozen times. So, use your best judgment when diving into the issue.

If we use only a fixed number of “why’s”, we’ll cause our teammates to answer these questions in a shallow manner so that they can “get this over with.” Our goal is to learn as much and improve as much as possible, not to fill out paperwork for the sake of paperwork.

And, similar to crisis resolution, not every preventative solution will make sense to implement immediately. Rank-order each possible resolution in terms of return on investment, and knock out the ones that make the most sense to tackle.

For the items that you and your stakeholders actively decide not to tackle, save them in a centralized location. As more crises play out over time within your product organization, you’ll start to see critical mass gather around some of the higher-effort preventative efforts.

As an example, “shifting from a monolith into a microservices-oriented architecture “ is something that typically comes up from RCA’s, but it rarely makes sense for any one team to tackle on their own.

When there’s enough critical mass around a particular preventative solution, the product team can carve out dedicated effort to tackle those larger prevention efforts.

Partnering with product operations to manage crises

In larger product orgs, the product operations team will help to facilitate crisis management. Think of them like air traffic controllers, whereas the product manager is the pilot of the plane. Product operations managers will help manage inbound information and outbound communications, but the product manager is ultimately responsible for managing the crisis.

Product operations counterparts can help in a variety of ways, such as:

  • Identifying which stakeholders need to be consulted in a crisis
  • Formalizing the crisis playbook in the org
  • Creating war rooms (e.g. Slack channels and Zoom meetings)
  • Scheduling RCA meetings
  • Structuring the format of an RCA
  • Broadcasting RCA learnings to the org
  • Running practice crisis drills (e.g. annually)

Closing thoughts on effective crisis management

Crises don’t have to be scary. With enough upfront visibility of “how a crisis might play out”, you’ll be significantly calmer and more confident in tackling the crisis as a product manager.

By using a blameless attitude and by staying calm throughout the situation, you’ll guide your product towards success. In fact, many times, customer trust and stakeholder trust is built up in crisis scenarios like these.

After all, every customer and every stakeholder has been through product crises and relationship crises before. The one that they’re experiencing with you is not the first that they’ve been through, and it won’t be the last that they’ll ever have.

By serving customers and stakeholders better than they expect, you’ll build up trust and credibility over time. That’s why as product managers, we shouldn’t view crises as an undesirable burden that must be handled. Instead, we should see crises as opportunities to improve our processes and iterate towards the best end-to-end experience for our customers.

--

--

Clement Kao
Product Teacher

Product manager, businessman, and biologist devoted to the intersection between tech, business, and life. Founder at Product Teacher. Loves to help others!