What’s in your cloud?
Thoughts on the Capital One breach by a former Capital One software engineer, team lead, cloud engineer, and security employee
I’m going to start this blog post with:
Security is hard.
Anyone who scoffs at Capital One or other large companies trying to secure organizations with multiple lines of businesses and 10,000+ developers better be careful because you might be next. Still.
We can do better.
I liked working at Capital One. It definitely had some ups and downs, but overall, it was a positive experience. I worked there longer than anywhere else besides my own company (a former company). I hope to finish out my career in a new company I started offering cloud security services. A lot of very smart people at Capital One likely knew about this problem and told someone to fix it. Some very skilled security people who worked at Capital One when I did have already left the company. I did receive a tip from someone whom I believe no longer works at the company about what happened, but I don’t really know that person and cannot verify what this person told me via insiders I know, for obvious reasons. They would like to keep receiving paychecks. Completely understandable.
Before I dive into technical details as to how this might have happened, let’s start with what is fundamentally much more important in terms of stopping breaches like the one that happened at Capital One. I am writing a book called Cybersecurity for Executives, one blog post at a time. This book is for executives because that’s where the responsibility for these breaches actually lies. Executives can’t just say it is their technical teams’ faults anymore because it’s clear that technical teams aren’t always in charge and not able to do the right thing to prevent these breaches due to executive-level decisions. Technical and security teams are limited by budget, support, training, staff, and authority to solve these types of problems. They are hampered by executives who don’t understand the risks at even the most basic level.
At the same time, developers aren’t rewarded for secure solutions. They aren’t sent to cloud security training before they are released to the cloud with excessive permissions to make risky changes. They are rewarded for getting things done. They want to break all the rules and rush to get solutions into production because that will get them the big bonus. That might get them a promotion and a trip to headquarters and a fancy award. Their names might appear on some company newsletter telling the rest of the company about their monumental achievements. They might have made a lot of people happy in the process, and the money might be rolling in as a result of those efforts.
No one notices that the architecture is completely flawed in terms of security and poses a great risk due to a myriad of exposures and CVEs (explained previously). The attack vectors are not evaluated. No threat modeling is performed on the system. No one really knows if it is secure or not in many cases. Can you blame them? Not really. They are doing what they are told.
I believe that most executives understand the concept of risk. Cybersecurity boils down to risk decisions. What are the chances that something could go wrong? What is the cost if it does? What will it cost to prevent that from happening? Decide appropriately. Here’s a hint — a shiny new box with whiz-bang AI and machine learning blockchain-based supercalifragilistic nonsense baked inside isn’t going to save you. There are fundamental concepts that apply, and they largely have to do with math and statistics, not programming, machine learning, or deciphering packet headers. You too can understand these basic, fundamental concepts in cybersecurity, if you can understand a financial statement or something like cost-basis tax laws which are insanely complicated. (I built systems to support that for Capital One before I moved to the cloud team.)
The cost of a data breach is a significant expense and risk for any business. The average cost I reported for a breach likely just went up after the $700 million Equifax settlement. Executives need to take the time to understand cybersecurity fundamentals, just as they take the time to understand financial reports. This understanding does not need to be deeply technical. It is more about metrics and statistics at the highest level. It’s about understanding risks associated with poor architectural decisions — which may have contributed to the Capital One breach. It’s about measuring holes in your infrastructure. It’s about training and putting the right people in charge who have deep cybersecurity, business, and software knowledge that can help solve the problem. Then let them solve it. In my experience, CISOs have to be more politicians than security professionals. This isn’t helping. And at the end of the day, the CEO needs to sign off on the plans. The CEO can only put the blame on someone else if the CEO provided directions, and someone didn’t follow those directions. How can the CEO give direction about something which they don’t understand? It’s time for CEOs — and politicians who make laws about cybersecurity — to learn the basics.
Security is not free. It requires an investment. I just heard a statistic — companies who have not yet been breached spend 8–10% of their IT budget on security. Security is also a mindset. Technologies change. Regardless of what technology you are using — the underlying fundamentals are the same. It has to do with how many opportunities you give an attacker. It also has to do with the concept of the Black Swan — the unexpected mega-breach. It has to do with the concept of keeping your friends close, and your enemies closer. It’s about culture. Ensure everyone in your company is trained and supporting your cybersecurity initiatives because they understand that they are important and why. It’s about metrics. How can you measure your security and report on it the same way you report on your finances? I’m going over all these things in my book. But for now, let’s return to the technical minutia.
We’ve seen a number of hints about how the breach may have happened. I may be missing a few details in my explanation below but I believe the gist of what I’m explaining is a accurate. The Capital One web site is vague on the technical details, but the breach may have looked something like this:
AWS uses a service called IAM (Identity and Access Management) to grant permissions to people and resources in an AWS account. Virtual machines (computers that run in the cloud) get permission to take certain actions in the account. One of these actions is the ability to read files stored in the cloud. In this case, it appears the stolen files existed in something called an S3 bucket. The S3 bucket was not exposed to the Internet like so many of the other breaches. The problem appears to be that a single virtual computer had permissions to access way too many files. I don’t understand how or why this configuration would ever be allowed as it clearly poses an excessive risk. Someone told me this computer that was breached had access to all the S3 buckets, which would be even worse.
Why was this configuration undesirable? As fate would have it, I wrote a related blog post about why you should not do this about a week before the breach. It’s about limiting the damage an attacker can do with any one set of credentials. If only someone with the authority to fix this problem had read that article one week before and changed the configuration maybe I wouldn’t be writing this blog post. This is not a technical concept. It’s like giving someone the master key at a hotel. They can access every room. Who are you giving those keys, and how do you monitor them? What if those keys get stolen?
Before that, I wrote about zero-trust, which also sounds like it may have been applicable. Capital One was running something called a WAF (web application firewall) in front of their web servers. A WAF is supposed to stop certain types of attacks on websites. The problem is that attackers are always finding ways around these security systems. It’s a game of back and forth. The attacker finds a new hole. The security team and security vendors plug it. This will never end. That’s why it’s no longer acceptable to trust anything even in your internal environment — and monitoring is as important as preventing the attack in the first place as far as defensive measures are concerned.
Another chapter in my blog book is all about understanding where your data lives, and what can access it. This post about data exposure explains the need for maintaining a good inventory of your data and understanding possible paths to access it. In this case, the data was not exposed directly to the Internet, but it was accessible to systems on the internal network. I also cover internal network attack paths that can be used to attack systems or exfiltrate data, as was in the case of Capital One. Understanding all these vectors and doing appropriate threat modeling is vital when coming up with a good cybersecurity strategy. At the macro-level CEOs and other executives can ask for reports that provide related statistics to understand the risk.
Before I dive into what might have worked better for Capital One, I want to address one other topic. Was this breach Amazon’s fault? Can we somehow blame the fact that companies are moving to the cloud for this outcome? Anyone who makes that claim is severely lacking in understanding of this breach in particular, proper security architecture, and the security controls in AWS.
No AWS security controls in the cloud failed in this breach. One might stretch to say the person doing this formerly worked at AWS and had too much knowledge of the Capital One environment, but that too would be a naive assessment. Here’s why. Capital One had about 11,000 developers when I worked there. Many of those developers had access to the same configuration information as that AWS employee, if that employee actually had access. Any of the Capital One employees could have done the same if they left and were disgruntled (or being paid by a foreign government).
The attacker seems to have had a better understanding of the possible outcome and the likelihood of a breach as a result of the poor architectural decision than the people who made it. This person obviously had emotional challenges of some kind and was reportedly drinking at work which caused AWS let the person go. She posted comments on Twitter publicly about what she was doing which may have led to another person reporting the breach to Capital One. Any disgruntled or otherwise negatively motivated employee at Capital One could have done the same, so I don’t think it’s fair to blame Amazon for this, unless they are not adequately evaluating and monitoring people they hire. Since they let the person go it seems like they are monitoring for unwanted behavior. I wrote about similar cases with non-cloud systems at other organizations in prior blog posts so this is not specific to cloud.
The AWS security controls available are easier to implement than traditional on-premises controls in my opinion. People just need to learn how to do it correctly. Much more granular controls with zero trust and micro-segmentation to separate different credentials and applications exist that are not even feasible in a non-AWS environment. These controls and the ability to segregate logs so they cannot be deleted or tampered with make AWS a solid choice for a security-minded organization — if they know how to use it correctly and implement secure architectures on top of it.
To summarize: this breach was not Amazon’s fault or due to the fact Capital One moved to the cloud. It’s how they moved to the cloud that caused some issues. Almost every security person I talk to is facing the same struggles I mentioned above in their own organization related to cloud. Organizations are moving to the cloud before understanding the security controls and giving too much access to people to whom they have not provided adequate security training. People trying to please executives are making compromises on security choices or executives are not listening to people who are warning them about security problems. None of this is unique to cloud. It seems to me that as many or more breaches occur non-cloud environments.
What would have been a better solution for Capital One? I don’t know all the details, so this may or may not have solved the problem, but the WAF should never have access to the S3 bucket in a good architectural design, except write-only access to a single bucket to write log files potentially. The WAF is an Internet-facing system and should never have access to any data. In a three-tier architecture, the WAF would interact with a web tier. The web tier interacts with an application tier. The application tier interacts with the data.
Each application should have access only to the buckets that contain the data that pertains to that application, no more and no less. Often that can be broken down even further to the different micro-services that make up an application. In one system I architected one team wrote a configuration service and had access to those related S3 buckets, but they did not have access to the buckets related to IOT device authentication, or the files related to the reporting service.
Additionally, file permissions in a sensitive application like this should not be given on a system-wide basis. Some sort of authentication mechanism should ensure each logged-in user can only access his or her own files. Many SAAS applications make this mistake and use one set of credentials for all customers in the system to access system components.
As for encryption, it appears that the role the attacker used was able to decrypt all the files. The role on the WAF should definitely not be able to decrypt any sensitive data within the account. The same goes for web servers. On AWS granular roles and policies can be created around encryption and decryption so only the appropriate parts of the system can encrypt or decrypt the data. I can’t really say that Capital One made any mistakes on this point because I don’t know the details of the system implementation or exactly what the attacker did. The details are vague on the Capital One statement about the breach and in the FAQs. I’m guessing that the encryption and decryption permissions could have been tighter, but I can’t say that with authority.
At any rate, I hope this post will help others who are trying to figure out if they might be susceptible to a breach like the one that affected Capital One. If you’d like to consider how you can determine if your company has holes in your environment that could lead to a similar breach, I hope you will read my book or take my class. I really want to help organizations avoid breaches like this one — that’s why I write, teach, and give presentations.
Check out the book I’m writing: Cybersecurity for Executives
When you join medium and clap for the articles, I get paid. Your virtual applause is much appreciated and helps me keep writing.
Upcoming events Teri Radichel will be speaking about or teaching cloud security:
IANS Charlotte Information Security Forum (September 25–26)
IANS Houston Information Security Forum (September 11)
Bienvenue au congrès ISACA Québec 2019 — Quebec, Canada (October 7–9)
OWASP AppSec Day 2019 — Melbourne, Australia (November 1)
…and of course she’s usually at the Seattle AWS Architects and Engineers Meetup sponsored by 2nd Sight Lab!
Past Cloud Security Presentations (Videos and Podcasts)