Are You Ready for a Technical Crisis?

Carl Brundage
Salesforce Architects
8 min readDec 14, 2021

--

Cobblestone image

I’ve experienced my share of crises, often created by my own hand. Once, for example, I was working late to clean up a sandbox. I was reclaiming storage space before a major upgrade by truncating objects copied from production to remove records that were no longer needed. I started the truncation process, but after a few minutes, something seemed amiss. The process usually ran quickly, but this time it was taking longer than normal. That’s when I realized I was truncating objects in production.

How would you react in a crisis like this? You might not know until you experience a similar crisis firsthand. One thing is certain, however. Thinking about how to communicate in a crisis and preparing in advance gives you the best chance to get back to normal. And that’s the goal of crisis management: to minimize the impact of the current crisis and get back to normal operations as quickly as possible.

Know that crises will happen and plan

In my situation, I hadn’t given a lot of prior thought on how to manage this particular kind of crisis. This absolutely put me at a disadvantage. Not only did I need to figure out how to resolve the technical issue, but I also had to think through managing the crisis. Questions like who to call, how to notify users, and what to say hampered my ability to focus on how to actually fix the problem.

When you’re considering these questions for the first time while in the middle of a crisis, you’re not likely to get back to normal operations quickly. Like with most successful endeavors, preparation is key. By creating a plan ahead of time for how to respond when things go wrong, you can shift the focus from thinking about what to do, to solving the problem in front of you.

What does a good preparation plan include? Start by identifying the types of issues you may face and the potential business impacts. Many potential crises can be quite dire. While such crises would have significant impact, they are also unlikely to occur. For example, you’re much more likely to misconfigure DNS and knock your service offline for a day than to have a meteorite take out your cloud provider. Prioritize risks that have both a high potential impact and a reasonably high likelihood of occurring. Then dive into the details of what could go wrong and how you will respond. Keep in mind that you don’t need to fill a binder with detailed, step-by-step procedures. Instead, you need to establish a framework for how to respond.

Knowing what you may face helps shapes your response plan. Likewise, the type of issue you face will determine the roles that need to be involved in the response. In a crisis, the leadership team should work to oversee stakeholder management and clear as many hurdles as possible. The support team should also be involved, especially when you need to communicate to stakeholders at scale. Additional support can also come from other areas of the organization. For example, if the support team (and support systems) are unable to handle the scale of the response, you may need the customer success team to supply supplemental support.

With leadership and supporting roles in place, consider the type of technical skills that will be needed. A network outage will require different specialists than a database issue. Additionally, keep in mind that what looks like one type of technical issue can quickly turn into a completely different one upon deeper inspection. Plan to be able to pull in other skill sets on the fly and think about the mechanisms required to get the help that you’ll need.

Next, consider the timing of the crisis. What if it occurs at 11 p.m. on a Saturday? Will you be able to assemble all the people to fill the roles you identified in the plan? Be sure your plan has the flexibility to handle different conditions and crisis timing. You may want to examine the decision-making authority the crisis response team has and build in the ability to increase the scope of authority in specific situations.

Takeaways:

  • Create a plan that identifies the types of issues you may face and their potential business impacts.
  • Prioritize risks that are high impact and have a likelihood of occurring.
  • Identify the skills and decision-makers needed to execute your plan.

Assess the situation and react

I quickly came to realize that in a crisis time is not your friend. While my crisis was off-hours, a crisis that occurs during peak hours can be significantly more stressful. There’s nothing worse than watching the minutes ticket by as you struggle to figure out what to do, unable to get in contact with your colleagues.

The focus should be on first finding a clear picture of the situation. Waiting to see how the situation develops will not lead to a better outcome. Often, recognizing the true cause of a crisis can be the hardest task. Once you can finally put a name on the issue, it’s easier to fix it. Further, when you’ve identified the root cause you’ll be better able to communicate with others and present key concepts in a simple, accessible manner to create a common understanding of the problem.

Prioritization is essential to working toward a resolution. For a good example, consider how utility companies restore power after large storms. They restore the largest blocks of customers requiring the easiest fixes first, and then work toward the smaller blocks and more time-consuming fixes. Similarly, you should tackle the big, must-have issues first to mitigate the crisis impact.

Make your decisions as quickly as possible, but don’t make snap decisions without thinking. Be sure to anticipate the potential for cascading side effects. For example, you may have a way to get the system back online quickly, but you’d lose the last 30 minutes of data — is that an acceptable tradeoff? Thinking through such tradeoffs enables rapid decision-making in the crisis.

Finally, recognize that there are times when it’s wiser to make peace than to be right. The middle of the crisis is not a time to deep dive into a root-cause analysis or to assign blame for the issue. Remember, the goal — and the top priority — is to resolve the crisis as soon as possible.

In my situation, the longer I let users access the system the more of a mess it would be to clean up the mix of old and new data. So, I quickly disabled access to the system. This kept the problem from worsening as I worked to fix it.

Takeaways

  • Identify the events that will trigger a crisis response plan.
  • Tackle the big, must-have issues first to mitigate the crisis impact.
  • Keep the focus on resolving the crisis, not assigning blame.

Know your instincts and resist

When dealing with a technology crisis, one specific tendency often determines whether the crisis escalates or moves toward resolution. Unfortunately, avoiding this tendency goes against the instincts of many technical folks.

This tendency, simply put, is focusing too much on finding the answers to the technical puzzle underlying the crisis. While helpful in resolving the issue, this tendency can actually make the crisis worse. In past crises, I worked single-mindedly on solving the puzzle and waited until I had an answer to communicate. I wanted to know everything I could before facing a stakeholder. I learned that this is the exact opposite of what stakeholders want.

When something goes wrong, your stakeholders need to know you’re working on a resolution. Think about it from their perspective. They know there is a serious problem, and they are likely experiencing a combination of fear, uncertainty, and loss of control. Ignoring stakeholders while your work on the problem only magnifies their frustration. It’s a path to an ever-escalating crisis.

Instead of waiting, you need to own the communication plan and reestablish trust. This isn’t a situation where a single message or phone call keeps stakeholders happy. You must set out a comprehensive plan that aligns with the severity of the situation and adhere to it. Perhaps the schedule is to update stakeholders every day or every hour. Regardless of the frequency, you must supply updates as promised. You may not yet have the answer when it’s time for the update, but you still must communicate.

The timing and content of the message is as important as the schedule. First, you need to promptly respond at the start of the crisis. Proactively acknowledging the situation lets you set the tone. Otherwise, the information void will be filled from other sources — sources that are heavy on rumors and light on facts.

Next, the communication must come from a place of compassion. In a crisis, emotions run high. If stakeholders don’t feel heard, the facts won’t really matter. Recognize what stakeholders are feeling and be empathetic. An important part of compassionate communication is honesty. Be sure to not “spin” the truth, whether through understatement or omission.

Finally, make your communications interactive and informative. Remember, communication is not a one-way flow of information. Stakeholders expect to be able to ask questions and engage in meaningful dialog. Be prepared to answer the basic questions of who, what, when, where, and how with precise and concise language. It’s not what you tell them, rather what they take away from what you said — so take time to ensure your stakeholders come away with a clear understanding of the situation.

Takeaways

  • Promptly and proactively respond at the start of a crisis.
  • Communicate from a place of compassion; stakeholders need to feel heard.
  • Make communications interactive and informative; stakeholders expect to be able to ask questions.
  • Be prepared to answer the who, what, when, where, and how with precise and concise language.

After the crisis

We’ve all been in crises before. Many of us more times than we would like to admit. Unfortunately, we are likely to be there again. The work doesn’t end once the crisis is over. Taking stock after the crisis is essential to avoiding or responding more effectively to the next one.

First, from a technological standpoint, you need to know what went wrong. Now is the time for the root-cause analysis that you wanted to do earlier. Finding out exactly what happened will keep you from repeating it in the future.

Next, review the plan and its execution. Schedule a meeting to assess what went well in the crisis response plan as well as areas for improvement. Questions to ask during this meeting include:

  • What do our stakeholders say about how the crisis was handled?
  • How long did it take to resolve the crisis and what could have been done to recover faster?
  • Besides the roles we had previously identified, who else was needed to resolve the issue?
  • If the incident occurred at another time (for example, off-hours versus regular business hours), how would this have been handled differently?

Remember that effective crisis response rarely happens by accident. It’s almost always the product of preparation before the crisis; prioritization, execution, and communication during the crisis; and reevaluation of the response before the next crisis.

--

--

Carl Brundage
Salesforce Architects

Certified Technical Architect and Salesforce MVP specialized in Data & Analytics