Business Continuity is More Than Backup and Restore

Salesforce Architects
Salesforce Architects
12 min readOct 14, 2021

--

Image of arrows pointing in different directions.

So you’ve been given the responsibility of defining the business continuity plan (BCP) for your Salesforce instance. You may have some experience with building a BCP from having reviewed our BCP Quick Start. Maybe you created a simple plan to tick off a checkbox on a process list, or perhaps you’ve created a full-blown BCP for your critical business processes and regular disaster recovery (DR) testing. As an architect, you want to make sure that all your plans are well thought-out and your decisions are defendable with facts; this is especially true for BCPs because the decisions you make in creating your BCP will be put to the test during some sort of crisis, and likely scrutinized once the dust has settled.

To define a BCP for a cloud computing environment like Salesforce, you need to right-size your BCP relative to business requirements, impact analysis, and real metrics. You also need to define it within the scope of the overall enterprise architecture of the business, which may include applications and systems beyond Salesforce. This means the plan needs to be robust enough to enable recovery in a worst-case scenario, but also not over-engineered with functionality and requirements the business doesn’t really need (even if business stakeholders may not realize they don’t need it). You want to avoid the very real pitfall of taking on the unnecessary and ongoing costs of an over-engineered solution.

Business continuity is more than just having a backup

Some architects have an overly simplistic view of a BCP, seeing it as just a backup that can be used to restore data and/or metadata if ever necessary. Others are on the other extreme, seeing business continuity planning as the ability to have 100% uptime for all application functionality. The problem with these two viewpoints is that the former is too short-sighted to be a true production solution, and the latter relies on too much wishful thinking to be practical in the real world. A better way to think of your BCP is similar to the approach you apply when buying insurance: You make decisions based on the financial impact of an unfortunate incident, the probability of occurrence, and the cost of ownership.

With that being said, let’s break down a solid BCP into its separate components to clarify some of the decisions that architects need to make. It is important to understand that business continuity is a business solution that includes people and processes along with technology.

The components of a business continuity plan: 1. backup, 2. disaster recovery, 3. high availability.

Backup — Exporting full or incremental copies of your data and/or metadata at certain time intervals. Considerations include:

Disaster recovery — Restoring all or some data and/or metadata back to a certain point in time. Considerations include:

  • Backup site / technology
  • Data transformations
  • Recovery process
  • Scalability
  • Communications

High availability — Ensuring the system is available (perhaps just as read-only during an emergency) at all times using a failover system or similar mechanism. High availability is designed for companies that cannot tolerate a disruption to business continuity. Considerations include:

  • Failover process features (MVP)
  • Failover technology
  • Monitoring
  • Failover routing
  • Communications
  • Post-incident data synchronization

BCP foundations: planning, design, and governance

To create an appropriate, right-sized BCP solution you need a methodology that emphasizes data-driven decisions and avoids decisions based on anecdotes and assumptions. You don’t want to design, fund, and implement a BCP solution based solely on what you think other companies are doing or in response to emotional reactions caused by a recent outage.

A methodology for creating a BCP solution that is aligned to actual business needs includes four main phases: identifying business-critical processes and data, understanding business impacts, defining and implementing the business continuity solution, and creating organizational discipline.

A methodology for creating a BCP solution with four main phases.

Identify business-critical processes and data

Knowing which business processes are critical is essential, but it is not sufficient. You also need to determine the applicable standards for the processes (for example, SLAs, SLOs, RPO, RTO, RCO, and so on) from both a corporate and customer perspective. Also, you need to identify any regulatory and compliance requirements early in the planning stage. Without even a rough estimate of these metrics you don’t know where your target is.

Understand business impacts

All business users think their processes are the most important, so it’s necessary to understand quantifiable metrics in order to rank criticality. Use metrics that are important to the business. The most common are revenue lost, costs, damage to brand/reputation, legal/compliance issues, and time wasted.

Another key step at this stage is ranking your business processes in terms of risk severity and likelihood of occurrence (see table). This ranking is necessary to know which business processes are truly the critical ones, which ones are nice to have, and which ones can wait to be part of a later BCP if you have to implement the BCP in an agile fashion.

Table to help you rank risk based on severity and likelihood of occurrence.

This step will help you avoid spending a lot of resources to harden a system that historically has a high uptime (like Salesforce), instead of on a system that experiences more frequent outages (for example, a CTI solution or a legacy application). In short, this step helps you appropriately strengthen the end-to-end process and focus on the weakest link.

Define and implement your BCP solution

Once you’re at this step, you should have enough information to start defining and implementing your BCP solution because you have a good understanding of what is important to the business and the detailed BCP requirements. The BCP will have separate solutions for backup/recovery and high availability (if necessary). Other common activities at this point include defining the roles and responsibilities of the BCP team, internal and external BCP communication plans, and regular integrated BCP testing.

The high-level operational process for backup/recovery includes:

  1. Definition of an incident
  2. Impact analysis and root cause (if possible) of data corruption
  3. Accessing and scoping of backup data in backup system
  4. Communication throughout incident to all impacted stakeholders
  5. Assessment of data restore sequence and data transformations
  6. If necessary, data transformation
  7. Restoration of data to the primary system
  8. Testing of restore success

Similarly, the high-level operational process for high availability during an incident includes:

  1. Definition of an incident
  2. Communication with Salesforce in case of performance degradation
  3. Decision process for failover go/no-go
  4. Cutover from impacted primary system to failover system
  5. Communication throughout the incident to all impacted stakeholders
  6. Monitoring and assessment of restoration of primary system performance
  7. Decision process to revert to the primary system
  8. If applicable, migration of modified data from the failover system to the primary system to ensure data integrity

Create organizational discipline

All the steps up to this point rely on a team that can drive an organization to evaluate, prioritize, execute, and communicate, while optimally using people, processes, knowledge, and technology. This team is commonly known as a Center of Excellence (CoE). Many other posts have been written (and will continue to be written) on the subject of CoE, but the key point is that BCP is one area that a CoE should govern to ensure that the BCP process is derived and formed by putting in place the right people with the right knowledge and responsibility .

Also, the CoE should help ensure that the BCP is continuously improving. Especially when it’s first created, a BCP won’t be perfect. Plan to improve your BCP iteratively and continuously. Having a limited plan in place today is better than waiting to have the perfect plan at some future date. Don’t let perfection get in the way of good.

Backup and recovery

While backup and recovery are technically two different aspects of a BCP, for most plans they can be considered together because they have the same business and compliance requirements, and the technology to perform both will be the same. In exceptional instances they aren’t the same; this can occur when complex data integrations are involved or when the backup tool doesn’t have a restore functionality.

To identify the best solution for backup and recovery, follow these steps:

  1. Determine if you have different business requirements for data recovery of different business processes, or if all data has the same requirements. For example, there may be some data elements that have legal or compliance implications (such as an SLA of being restored within 24 hours, while all other data elements can be restored any time within a week).
  2. Estimate what your budget for backup and recovery will be. This is important when considering and justifying the cost of necessary complex functionality.
  3. Where does the source of truth of the data reside? For instance, copies of data can exist in multiple data sources (such as Salesforce, an Oracle database, or a legacy internal database) but the data source that is considered the most up to date and accurate is the source of truth.
  4. Where will the backup be stored?

Based on your answers to these questions, you can determine if it’s possible to use out-of-the-box Salesforce functionality or if you need a more complex developed solution or third-party application. The general rule of thumb is the simpler your backup requirements are, the more straightforward and less expensive the solution.

As you consider your backup/recovery strategy, keep in mind your high availability requirements because some design components will likely be common to both.

High availability — Solve the right problem

A common misunderstanding with BCPs is that high availability is the same as disaster recovery. They complement each other but they are different and solve different problems. To understand the difference, consider the analogy of a restaurant during the COVID-19 pandemic. The pandemic is a disaster that no restaurant had planned for, and all their backups and redundant systems couldn’t protect them against it. Disaster recovery, in this analogy, includes all the activities necessary to get everything back to normal (such as developing and distributing the vaccine). And we have seen that it can take a frustratingly long time.

High availability, in contrast, involves steps that the restaurant can take to keep the business moving and revenue coming in, even in a significantly limited capacity. During the pandemic, this included focusing on take-out capacity and efficiency. While take-out services doesn’t bring in the same revenue as a fully functional restaurant, it’s ideally enough to keep the restaurant from shutting down while disaster recovery (vaccine development and distribution) is happening.

With this understanding in mind, follow these steps to identify an appropriate high availability (HA) solution for your organization:

  1. Determine what your top critical business processes are. These are the business processes that are the lifeline of your business and being unable to perform them will bring the business to an immediate standstill.
  2. Simplify and reduce your set of identified critical business processes with the MVP (minimum viable product) process. Essentially, you need to determine the minimal set of data, business functionality, people, and applications/tools to just get the business needs fulfilled. Be critical and ruthless during the MVP process to eliminate nice-to-haves and ensure you keep only must-haves. The better you do this, the more simple and cost-effective your HA solution will be. Also remember that this is an iterative process; you can add those nice-to-haves to the solution after you have your baseline HA infrastructure completed.
  3. Once you’ve narrowed your list of critical business MVP processes, calculate the actual financial impact of them being unavailable. This is a critical step in prioritizing and justifying the cost of the HA solution.
  4. For those MVP processes with the largest financial impact, determine the most likely points of failure. This is also a critical step to ensure you are solving the right problem to improve the availability of the entire end-to-end process. For example, is it worth building a redundant Salesforce system to improve upon Salesforce’s uptime (which has historically been 99.9%), when your CTI tool or your legacy back-end application has an uptime of 80% to 90%?

Do you need a custom high availability solution?

While not many Salesforce customers have gone the route of creating a custom high availability solution, some have done just that after going through the cost/benefit analysis. A couple of valid examples are that the costs are measured in human lives due to the services being provided through Salesforce, or the cost to the business being down are measured in the tens of millions of dollars per hour. Review the table below to better understand common custom high availability solution patterns. Factors to consider when choosing a solution include budget, automation requirements, business use cases, and technical capabilities and resources.

High availability and uptime calculations

To better understand why it’s important to realistically quantify the financial impact of unavailable business processes and know where your most probable points of failure are, let’s do some (fun!) math on a realistic example of Salesforce uptime and what you might achieve by improving upon it with a custom HA solution.

Let’s say that over a twelve-month period, Salesforce had an accumulated downtime of 3hr 25m, which includes both unplanned and planned downtime. That amount of downtime represents 99.961% total availability, which is referred to as 3N (or three nines). If 3N is unacceptable to the business, let’s see what you’d get with a custom HA solution that provides more nines:

  • Improving to 4N (99.99%) would reduce the downtime to 52 minutes. That would buy you an extra 2hr 30 min in uptime for the year.
  • Improving to an exceedingly difficult 5N would further reduce the downtime to just 5 minutes.
  • Improving to a ridiculously high mark of 6N would reduce downtime to a mere 31 seconds.

This is just the uptime math; in order to actually achieve these very high availability goals you’d need to account for the time it takes to detect the outage incident, make the executive decision to activate the high availability custom solution, and then run through the procedural steps to failover.

In the real world, the high costs of designing, developing, implementing, and maintaining such an extremely high availability custom solution quickly outweighs the diminishing returns to the business for the vast majority of Salesforce customers. This is why for most customers, the conclusion after analyzing the true financial impact to the business due to outages is to rely on Salesforce’s own internal high availability infrastructure that’s included as part of the cloud computing service.

The BCP maturity curve

To better understand how much effort will be required to achieve your BCP goals, it’s useful to determine where you are on the BCP maturity curve. This will give you a sense of how mature your plan is, where you need to improve to get to the next level, and what your future goals may be.

As a quick summary, the five stages of the BCP maturity curve are:

  1. “Sleep” is when you have no metrics or goals defined and you are just starting the plan
  2. “Crawl” is when you understand the business and its critical processes
  3. “Walk” is when you have a defined backup and recovery process
  4. “Run” is when you have a defined high availability process
  5. “Fly” is when your high availability process is optimally automated

Architect pro tip: Most customers don’t need to fly; running is just fine.

Conclusion — Seeing the big picture

Business continuity planning can be a daunting task, especially if you don’t have experience in architecting BCP solutions or don’t have BCP experience in a cloud computing platform. The key is to have an organized decision track to design what the business really needs, and not make assumptions without a true financial analysis. Have a plan to build your BCP so that you can justify and defend decisions that are made. Essentially, don’t build a cruise ship when you only need a ferry, but also make sure you’re not caught without even a lifejacket.

To learn more about business continuity planning and related topics, check out the following posts:

About the Authors

William Lem is a Success Architect at Salesforce. He has been at Salesforce since 2008, working with customers of all segments as a technical thought leader to accelerate their adoption of Salesforce solutions that transform their businesses and achieve business objectives. He focuses on technical enterprise architecture, application development, DevSecOps, and business continuity planning. Find him on LinkedIn here.

Ingo Fochler is a Success Architect at Salesforce. He has been at Salesforce in various technical roles since 2011 advising and consulting with large complex customers on CRM administration, application development, data architecture, SSO, and integration. Find him on LinkedIn here.

Dana Furman is a Success Architect at Salesforce. She has been part of the Salesforce ecosystem since 2012, starting as the R&D Director for a leading Salesforce partner and moving to different architect roles in Salesforce, acting as a strategic advisor and leading some of the biggest customers through their digital transformation journeys.
Dana attained her Certified Technical Architect (CTA) credential in 2019. Find her on LinkedIn here.

--

--

Salesforce Architects
Salesforce Architects

We exist to empower, inspire and connect the best folks around: Salesforce Architects.