Preparing for cybersecurity disasters
Strategies to make sure you are ready
This next post in my series on Cybersecurity for Executives covers disaster recovery, resiliency, and backups. This topic introduces another question you can ask your security team — but It’s more than just “Do we have backups?”
Backups are not simple
Sure, you’ve heard it a hundred times before if you have spent any length of time in IT, security, or dealing with systems and data. Back it up. Make a copy in case something goes wrong so you can restore it all.
It sounds simple. It’s not. Creating adequate backups and the ability to restore systems efficiently requires forethought and planning, threat modeling, access management, encryption, and lifecycle management. Backup and recovery routines should be run by those who pay excellent attention to details. The one day your backups fail could be the day you need them. Who is monitoring and testing your backups?
I tell this story in my cloud security class sometimes as it applies to contracts, but it also applies to backups. I hired a company to do my server backups for one of my prior companies, Radical Software, Inc., a long time ago. I was new to running a business and new to managing vendors or systems. At some point, one of the Linux systems failed. I asked the company to restore it. They said, “Oh, we were only backing up the Windows systems.” What? My contract specified backups. Nowhere in the contract did it state only the Windows systems should be backed up. I was so fortunate that my customer had a copy of their software in another location and did not sue me. Then I, in turn, would have to try to figure out what to do about the contract with my vendor.
This experience teaches some excellent lessons. Contracts are essential when it comes to vendor management. Don’t make assumptions. Make sure it is in the contract. These assumptions also apply to the soundness of your backups. Even when you are paying someone else to perform system backups, you want to take steps to ensure the backups exist and are restorable when you need them. Your organization should verify the implementation of backup systems like anything else.
Some ransomware attacks also teach lessons about backing up data. Ransomware is malware that attackers get onto a system. Once it gets installed, it starts encrypting all the files. I explained encryption in an earlier post — it basically makes your files unreadable without going into the detailed definition. The ransomware leaves a note telling the owner of the files how to pay a ransom, usually in bitcoins, to get the data back to a readable state. Paying the attackers gives them more incentive to keep spreading ransomware. Also, the attackers do not always restore the files once payment is received. Insurance companies do not always cover the costs, and even if they do — your rates go up afterward (and likely everyone else’s.)
If you thought ransomware was going away, it’s not. The Dutch government just sent out a warning about three types of ransomware affecting over 1800 businesses. Brian Krebs just wrote about nursing homes being cut off from medical records in a ransomware attack. Ransomware recently infected a company hosting online applications for over 440,000 customers. These are just a few examples.
Having backups is a part of an overall strategy for protecting your systems against ransomware. If someone clicks a malicious link, opens a malicious attachment, or a system exposed to the Internet is compromised, you can say no thanks to the ransom if you have a way to restore your systems in the required time. Instead of paying the attackers, eradicate the malware, ensure that it cannot return by fixing the vulnerability that let it in, and restore the system and data. This process helps ensure that you can avoid paying the fine. Once again, this sounds simple, but I address some complexities below.
Sometimes ransomware spreads to one machine to another, using a tactic called pivoting. The ransomware might even do this automatically and spread around the world by infecting one device from another infected device, as it did in the case of WannaCry. Other times it spreads internally within a company as the NotPetya ransomware did. When the malware spreads, it could affect your backup systems if the network and credentials do not adequately segregate them from other systems. A sophisticated attacker might even target backup systems to take them out.
Additionally, in the Brian Krebs article I referenced above, he writes in the comments:
In nearly every story I’ve written about ransomware, the victim had a backup system of some kind. And nearly every story, some readers comment that if they only had backups….
While there are ways of backing up key data that make it far more difficult for ransomware to fiddle with, the mindset that enables that kind of preparation assumes the target is also doing things like actively and continuously monitoring for intrusions. And those organizations are few and far between. Also, keep in mind that these ransomware purveyors usually don’t pull the trigger until they’ve done what they need to do to escalate their privileges within the target to the point where they can do what the target’s administrators can do, and that includes managing the backup procedures.
Note that it is better to prevent the ransomware in the first place, using all of the rest of the strategies I write about throughout this blog series and book. Also, note that just having backups may not be enough. In some cyber attacks involving ransomware, the backups got encrypted along with everything else, because the attackers had access to the backups on the network from systems they had compromised. If attackers have administrative credentials, they can do anything the administrators can do, including encrypting backup files.
Disaster recovery strategies
What do you need to consider when it comes to backups? Disaster recovery and business continuity planning are broad topics with many considerations, some of which are specific to your particular organization. You need to invest the appropriate amount of time and money into ensuring that your business maintains operations to the level required so you can minimize losses and stay in business.
Businesses come up with objectives they want to meet in the event of a disaster to ensure they can maintain operations to the desired level. The organization then architects and deploys mechanisms to make sure they can meet those target objectives. Industry-standard metrics related to disaster recovery and business continuity include:
Recovery Time Objective (RTO): The time systems can be in the recovery phase. For example, from the point the disaster hits, the system can be down 5 hours max.
Recovery Point Objective (RPO): The point in time after an outage to which recover processes need to restore data. For example, the systems recovery processes must restore systems to a point where only 5 minutes of transactions were lost.
Businesses undertake Business Continuity Planning (BCP) to meet these objectives. BCP defines a set of plans for how an organization maintains operations in the event of a disruption.
Using these principles, architect backup systems and outline business processes to ensure your business remains operational in the event of a disaster. I explain how to implement disaster recovery in more detail in a cloud environment in my cloud security class, but from an executive viewpoint, here are some of the most critical questions related to backups and disaster recovery you should ask:
Do we have disaster recovery and business continuity plans?
Find out if your organization has plans if you are not sure. If you are a top executive in the company, you should be involved in defining and carrying out these plans in the event of a disaster. Sometimes contracts require vendors to have DR and BCP plans.
When is the last time we tested our backups and failover?
Just having a backup process doesn’t mean it works. You don’t want to find this out the day you need the backups, as I did. Lesson learned! Test your backups the same way you are (hopefully) testing your website functionality and web application security.
Were they tested by someone very detail-oriented and separate from the people who implemented the backups?
Just like editing your own written material, the person who created the backups might not notice their own mistakes. Sometimes, no matter how many times I read over a blog post, I don’t see my typos. Then I read it later or have someone else read it, and the mistake becomes obvious.
The same could be the case for the person who implemented your backups. They think they did everything correctly but didn’t notice their own mistakes. Alternatively, the people who implemented the backups may not want to expose or admit mistakes. Worse, you could have an insider threat in your organization. Who tested the backups? Does this process have appropriate segregation of duties so different teams implement and validate different aspects of the system? Is the person responsible for testing detail-oriented and well-versed in testing and quality assurance?
How were the backups and tested?
Did someone only check that files exist in a backup system, or did they fully restore the system? Depending on how critical the data and uptime is to your business dictates the level of testing required. At one large financial company where I worked, an exercise occurred every few months to completely fail over all operations from one physical data center to another. This process took many hours but not more than a day. Invariably systems broke, and processes failed, which led to improvements and verification of the backups, failover, and recovery process.
How long did the recovery take?
Some companies like Netflix test failover by terminating systems in production and validating the system recovers automatically. Automation is much easier on a cloud platform like AWS, which is designed for automation from the ground up. Netflix can failover from one AWS region (a geographical location where AWS has data centers) to another in under 10 minutes. Proper architecture and planning help ensure your backups are available when required, and recovery processes can meet the required objectives.
Were the systems tested, and was the data integrity validated?
Make sure that failover testing did not involve only executing the recovery process with no errors. System functionality needs to be tested to make sure this works correctly. Additionally, validate backup processes restored all data to the objective point with appropriate data integrity. Include the time to fix any corrupted data in the time it took to failover. Resolve the underlying issue that caused data corruption to prevent it in the future and improve recovery times.
Who has access to the backups?
As I wrote in my post on the aftermath of stolen credentials, know what actions may be taken by credentials obtained by attackers. Consider write-once, read-only backups such as is offered by AWS. Ensure that a single set of administrative credentials cannot change permissions on the backups and delete the data. If they can, store those credentials away and require two parties to access them.
Consider where you store those credentials, so they are available in the event of an emergency. Always use MFA — correctly. Ensure you — rather than the attacker encrypts backups. In a cloud environment like AWS, you can also apply policies to that encryption key. These and other strategies outlined in prior and upcoming posts helps ensure your backups are not destroyed by or accessed by attackers or malicious insiders.
Also, leverage network defenses to protect backups. Ensure proper controls are in place to prevent backup exposure to the Internet. Use network segregation to ensure that malware on systems being backed up cannot infect the backups. Perhaps you have separate people on different networks, maintaining backups and production systems if you have a large organization and highly sensitive data.
Try to catch the problem sooner!
Although this post is all about backups, make sure you have the logging and monitoring in place discussed in other posts to prevent having to resort to a massive disaster recovery process. Try to catch the problem earlier before it becomes a disaster. Create well-architected systems that self-heal when possible and are resilient to failure. If you want to dive deep into that topic and see some metrics from AWS check out this blog post by @ACockroft on Failure Modes and Continuous Resilience. I’ll talk more about monitoring and incident handling in an upcoming blog post.
Unfortunately, even when we try to prevent problems, things happen. Should disaster strike, ensure you are ready and can recover as quickly as possible. Make sure you have backups and systems that can recover in time using on-going testing processes.
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2020
Want to learn more about Cloud Security?
Check out: Cybersecurity for Executives in the Age of Cloud.
Cloud Penetration Testing and Security Assessments
Cloud Security Training
Virtual training available for a minimum of 10 students at a single organization. Curriculum: 2nd Sight Lab cloud Security Training
Have a Cybersecurity or Cloud Security Question?
2020 Cybersecurity and Cloud Security Podcasts
2020 Cybersecurity and Cloud Security Conference Presentations
Prior Podcasts and Presentations
Azure for Auditors ~ Presented to Seattle ISACA and IIA
OWASP AppSec Day 2019 — Melbourne, Australia
Bienvenue au congrès ISACA Québec 2019 — Keynote — Quebec, Canada (October 7–9)
White Papers and Research Reports