136 Days Later: Lessons from CapOne

Published in

AI+ Enterprise Engineering

7 min readAug 8, 2019

The following post was authored by Jerry M Denman (jdenman@us.ibm.com) and Charlie Brown (cpbrown@us.ibm.com) who are both Distinguished Engineers with the Cloud Engagement Hub and who work with Financial Services worldwide on cloud computing.

Our team works with banks on the risk controls necessary for using public cloud. This recent Capital One breach gives us a chance to look at the exploit and learn some valuable lessons from it.

March 12, 2019: A threat actor illegally accessed a cloud environment hosting Capital One applications and data.

July 26, 2019: The FBI arrested and charged the threat actor with committing a violation of Title 18, United States Code, Section 130(a)(2) — “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains — (A) information contained in a financial record of a financial institution, or of a card issuer”.

For 136 days, the sensitive data of the bank and bank customers were at risk in a cyber attack — and it could have been much longer. It was Day 127 after the attack, only after an anonymous tip, that the actual identification of the threat actor proceeded rapidly. Helping to speed the notification was the threat actor using the same ‘handle’ on both the dark web and social media. Without these two breaks, the time between the first realization of attack and the arrest of the threat actor could have been much later with the possibility of more financial accounts or assets compromised.

Cyber Security professionals call Day 127 the “boom” — when the event was first recognized.

Right of the boom is Day 127 and after: all actions and activities taken to identify the threat actor, remediate the damage, and update processes and procedures to prevent similar attacks.

Left of the boom is the time before Day 127, and represents 127 days of lost opportunity to prevent or shut down the attack earlier, before some or all of the data was exfiltrated.

What lessons can we can apply from this breach to prevent similar breaches from occurring or detecting and stopping the breach much earlier, in this case whether on the initial access on March 12 or subsequently on March 22 and April 21.

Per the U.S. Department of Justice press release and court filings https://www.justice.gov/usao-wdwa/pr/seattle-tech-worker-arrested-data-theft-involving-large-financial-services-company , the entry point for the attacks was a mis-configured firewall that enabled the threat actor to connect to the entry server. It is not clear from the court filings whether the user had credentials from her earlier employment with the cloud service provider or whether she had other credentials or whether she could connect anonymously.

Subsequent commands enabled the threat actor to get security credentials for an account with elevated privileges, identity files with bank data, and then exfiltrate these files to an outside location.

An important first step recognizes that security controls are a shared responsibility between the bank and cloud provider and test them at the overall system level. Both the customer (application) and cloud provider (infrastructure) should work together to test the overall effectiveness of security controls and deploy new and compensating controls to address deficiencies and vulnerabilities. If you approach security with the “us and them” approach, then the threat actors win because they will exploit that gaps that result with silos of controls.

We do not have access to the details of the cloud configuration or the investigation, but the public details suggest adoption of several standard controls could stop, recognize and shut down attacks sooner to prevent or minimize damage.

1) Automation of service configuration equals consistency — automate as much configuration of pubic cloud services as possible and put all that automation under as much security review as you would crown jewels source code.

We cite the firewall error as the enabling factor to the attack. Prevent the firewall error and prevent this attack vector.

One common source of firewall configuration errors is a manual configuration process. We recommend the configuration of public cloud resources only through automation. This eliminates much of the risk that manual configuration mistakes can lead to. Cloud security architects should review the automation scripts when they create them and again on every change. This way we don’t allow mistakes in automation to create vulnerabilities.

2) Ongoing configuration verification to prevent unintended or malicious configuration change and drift.

Even with an automated configuration process in place, there is always the possibility that an operator could manually change the configuration after it runs the automation — whether intentional, inadvertent, or malicious. We recommend deploying ongoing configuration verification tools that can monitor critical settings and warn when it detects a mis-match or automatically apply the correct configuration. Several tools are available for purchase or cloud use, often with pre-built profiles associated with common industry and security standards.

3) Identity Governance, least required privilege, and privileged administration management to ensure the timely addition and deletion of users, and that they only have the access for their current role.

It is not clear if the threat actor could access the first system using credentials from her prior employment with the cloud provider or use anonymous access, but either way she could act as an administrator to find and exfiltrated the data.

Identity governance systems can rapidly provision and de-provision user credentials across systems and prevent access by a former employee.

Least required privilege means that both users and administrators have the minimum access rights to perform their current role and turn off anonymous access across the system. Think of this like a car:

You always lock your car preventing anonymous access.
When you go to valet parking, you give the attendant the key that opens the door/starts the car, but does not provide access to the locked glove box.

Privileged administration access, similar to least required privilege, you do not give administrators automatic root or elevated access. If they need to perform an administrative task, an identity management system can give them one time use credentials to perform the task. And ensure that the people who control the identity management system are separate from the people who perform administrative tasks.

4) Data monitoring and loss/leakage prevention to prevent unauthorized data access, deletion, or tampering

Without data loss protection, the threat actor could exfiltrate gigabytes of data from the cloud environment without notice by the bank. It was the reporting of the breach that triggered the log analysis and not any active monitoring. Cloud and software providers provide many solutions for passively or actively monitoring access to critical files and databases.

In this situation, if traditional access controls didn’t prevent the exfiltration, data monitoring could have alerted the cloud or application administrators to the anomalous behavior, whether the first time this user accessed the data this year, and an abnormally high amount of data accessed.

5) Encrypt everything that is sensitive — always, whether in transit or at rest

The court document states that some fields in the exfiltrated data where either encrypted or anonymized / tokenized, but the bank did not use similar protection on other sensitive PII (personally identifiable information) such as names, birthdates, address — all information that hackers can exploit as part of identity theft.

Encryption is an easy way to counter any successful intrusion, especially with data in the public cloud. Risk reduction is worth the infrastructure and performance cost of encryption and can be the last line of defense if other controls fail. Encryption is only effective if you have strong management, governance, and protection of your keys, so implement secure management using an integrated key management/hardware security module.

Policy should ensure the keys are only available to those people, programs, services that clearly need to handle them and that these people / program / services follow best practices to protect these keys (e.g. do not store in memory or disk in areas where hackers could access and exfiltrate).

6) Automated log monitoring equal early warning — automated monitoring of logs for unusual activity can be a ‘trip wire’ that will alert your team to an intrusion.

Per the court documents, the bank log files contained the entries corresponding to the attacks, enabling the investigator to confirm the attack dates and methods. However, the significance of these entries wasn’t know until after the anonymous tip.

One log monitoring trigger would have been the *****-WAF-Role performing a ‘List Buckets’ command. A log scanning tool flag would flag this activity as unusual. Similar would have been the use of the Sync command by the *****-WAF-Role.

Another log monitoring trigger would have been the amount of data egress through the breached firewall. The amount of data passing through this firewall to an outside IP was likely much larger than normal and could have warranted investigation.

Two challenges with automated monitoring is that typically you need to know what you are looking for (pre-defined rules), and with increasing rules you often have increasing volumes of alerts. How does the security operations center (SOC) process and prioritize these known alerts, and how does the SOC identify new indicators of attack, both to stop the new attack and to create new rules?

Many organizations and tool providers are combining traditional SIEM (security information and event management) systems with artificial intelligence and machine learning both to identify the high priority events and anomalies (new patterns).

Finally, a key principle for securing any IT environment whether on cloud or on-prem is to implement layered controls, so no a single control failure could not result in a successful breach. There didn’t appear to be any other controls to counter the failure of the firewall. Adoption of one or more of the controls described above could be used to overlap the firewall control and prevent or detect the breach:

Access control would have prevented the exfiltration of the data by the threat actor.
Full encryption by itself would not have prevented exfiltration of the data, but it would have prevented the threat actor’s ability to use and monetize the information.
Additional firewall rules or a whitelist (only allowed addresses) or black list (denying specific known addresses that were the source of other attacks) could have prevented access to the service by the unauthorized address. We could implement these controls in the initial firewall, a subsequent firewall, in the server network configuration, or in all three.

Current public information was the basis of our analysis compared to typical leading cloud security practices Further security and compliance information from IBM is available at IBM Security Intelligence: https://securityintelligence.com/

136 Days Later: Lessons from CapOne

Written by Jerry Denman