This year (2016) I accepted as much incident response work as I could. I spent about 300 hours responding to security incidents and data breaches this year as a consultant or volunteer.
This included hands on work with an in-progress breach, or coordinating a response with victim engineering teams and incident responders.
These lessons come from my consolidated notes of those incidents. I mostly work with tech companies, though not exclusively, and you’ll see a bias in these lessons as a result.
Centralized logging makes everything better.
A theme in this article will be: “what separates standard incidents from horrifying nightmares?”
A good or bad story around logging will dictate the rest of the incident. I repeatedly find that audit logs are the backbone of all good security policy and effective incident response.
The first thing I do in any incident is understand how much I can depend on a victim’s logging infrastructure. The answer I receive will drastically change my experience and the success of the company. There’s a wonderful trend in CI/CD/DevOps culture where centralized logging and alerting is becoming a standard practice. I’ve become almost entitled to the expectation of rich data in any company formed in the last three years or so.
I recommend that any security or infrastructure team putting off a comprehensive approach to logging drop nearly everything to invest in it. This means getting all logs, from all hosts and applications, across all teams, into as few log destinations as possible.
Have a strong story around host, application, authentication, and infrastructure logging that will inform your preventative work for years to come. Additionally, it assists other teams as they meet availability goals.
Edit: A gotcha — be aware of user privacy in what you log, and how relevant long term storage would be in a breach. Shortened retention periods to protect user privacy are common and would be in greater demand depending on a product you build.
Conclusion: Prioritize well decorated, accessible, centralized and alert-able logs above most other security projects. An idea for an alert should easily land in production within 10 minutes or so, if done right.
You might not find the root cause of a breach.
More than one incident I worked this year went through to completion without ever finding a root cause.
This is a nightmarish experience for victims who have to meet with their leadership and executives to describe their mitigation efforts which are not guided by data. Containment becomes an incomplete and “best effort” problem.
With a root cause, a mitigation plan sounds something like:
“We wipe this laptop, replace that server, roll a single credential.”
Without a root cause, it sounds more like like:
“We wipe ALL the laptops, replace ALL the servers, roll ALL the credentials.”
The discovery of a root cause is an important milestone that dictates the emotional environment an incident will take place in, and whether it becomes unhealthy or not.
A grey cloud will hover over a team until a guiding root cause is discovered. This can make people bad to one another. I work very hard to avoid this toxicity with teams. I remember close calls when massive blame, panic, and resignations felt like they were just one tough conversation away.
No matter whether you’re big or small — it’s important to role-play a crisis every so often.
Conclusion: Practice regular table tops and red team exercises. Treat random bug bounty or vulnerability disclosures as full blown practice incidents. Practice scenarios where you are not in control, you are not omniscient, the right log doesn’t exist and talent can’t understand an issue. Fight from the ground every now and then with your team.
Persistent attackers will target homes.
“Bring your own device” is often used to categorically describe the risk employees bring to an organization. This does not well characterize the direct attacks happening against individuals within organizations.
This year’s incidents involving APT groups notably focused their attacks directly on employee’s personal emails and endpoints. Whether they show up at the office with their personal devices won’t matter if they’re sharing credentials or access tokens on personal accounts and devices, or accessing corporate accounts from home.
Understanding lateral movement from an employee’s home to corporate assets is incredibly hard. Manual follow up with employees was the primary area of investigative friction on numerous occasions. A common trend was shared passwords acquired from attacks on personal accounts and devices that were not used on a corporate network, but hosted credentials that were relevant.
Additionally (this fell into “zero root cause”) one incident in particular was highly suggestive of an engineer potentially storing sensitive credentials in their own personal cloud infrastructure to debug production infrastructure remotely.
Logs weren’t available in the time window we needed to guide us. We had heard that attacks were pointed at senior developers in the months that preceded the attack, but the investigative workload on personal employee systems would have been too blind and expensive to follow through with without some kind of lead to start with.
Conclusion: Find ways to improve your employees security practices at home. Subsidize their use of a password manager, MFA hardware, or anything else you can. Push hard for them to involve your security team even if they have personal security issues or see bad behavior while off-duty. Teach and enable them to protect their families from threats as well.
Bitcoin is targeted, even if you store none.
Platform companies are often compromised with the assumption that they may have access to, or integrate with, a bitcoin company. Please refer to the Blockchain Graveyard for more information on this trend, or this public example from SendGrid’s blog:
Conclusion: If you deeply rely on a partner’s technology, find a way to heavily manage that risk. If you’re a bitcoin company, practice extreme paranoia and take extraordinary measures in limiting the access of your partnerships.
Sophistication follows humiliation.
Many breach announcements this year pointed to a “sophisticated attacker” as a narrative of their issue. This usually is followed up by criticism when an initial means of their compromise is revealed.
Most breaches begin with spear phishing, commodity exploits, a leaked key, or some other obvious or preventable detail.
However, this is almost never the “sophisticated” aspect of a breach worth talking about. It’s easy to point at an embarrassing vector and dismiss the rest of an attack.
Thus, do not judge an adversary by the vector they’ve chosen. An adversary may show you what “sophistication” means after advancing from their beachhead.
For instance, while an initial vector may not be notable or interesting, the access or credentials an attacker has from a separate platform compromise may reveal a lot about how motivated and capable they were in targeting you.
As a public example: Would Lockheed describe their breach in 2011 as sophisticated? Even if their adversary came prepared with stolen RSA SecureID data… If it started with a spear phish, does that somehow mean the adversary is no longer intimidating?
Conclusion: “Sophisticated” attackers don’t flex their muscles on the initial intrusion effort. Don’t underestimate initial lame attacks against you as unsophisticated, an adversary will always exert minimum effort. That run-of-the-mill spear phish might be followed up with a new 0Day for all you know.
Manage your secrets and keys.
Management of secrets was a big differentiator at victim companies.
I wasn’t roped into a single intrusion this year at any companies with completely role driven environments where secrets were completely managed by a secret store.
This can either mean one of a few things: These environments don’t exist at all, there aren’t many of them, or they don’t see incidents that would warrant involving IR folks like myself.
Keys stored in source code, leaked into cloud logging platforms, kept insecurely on employee endpoints or personal devices, or copy pasted into gists and pastebin were all a consistent theme for me this year. Insecurity of secrets were both obvious root causes, or deeply exacerbated a breach once obtained by an adversary.
Conclusion: Look into AWS roles, avoid putting secrets into source code, keep real secrets away from developers, and be able to roll them quickly and often.
Credential theft is still the lowest hanging fruit.
Several incidents still occurred at organizations with pretty healthy messaging avoiding password re-use, especially with senior executive leadership. This awareness messaging ultimately does not matter when considering personal accounts, if not directly messaged to employees.
While awareness efforts may delay the inevitable quite well, it was much more effective to see credentials managed behind an identity provider and Single Sign On integrations into their cloud products. I have not responded to any incidents where MFA was broken within an enterprise identity solution.
When Single Sign On integration is not an option, finding the MFA options in each product, and enforcing them, is also a major mitigation step. A special shout out to GitHub is necessary, as teams frequently store secrets in source code that can be protected by enforced, team wide MFA until better secret storage options are agreed upon by a team.
Insider threats have some patterns.
The smallest minority of issues I worked this year involved insider threats. Each insider issue was within a known range of motives which I’ve seen for several years now, 2016 being no exception.
The first involves people who heavily identify with Silicon Valley startup culture and are incredibly aggressive in approaching the press to drive attention to their current or future company.
Specifically, you can use the insider threat model of:
“If I leak something to the tech press now, maybe they’ll write about my tech startup idea later”
While this is a fairly specific model, employees at tech companies really like leaking IP and product information for all kinds of outcomes.
This is common enough to consider a trending type of insider threat. This is tough to defend against as these are usually employees that don’t need much trust to leak with. It’s very hard to give prevention advice that applies broadly and doesn’t encourage a locked down, Apple-esque company at the same time. Most CEO’s want to be transparent to their employees and accept this risk.
The second pattern I’ve seen is around internal customer support tools. Once you hit a certain number of employees with access to administrative tools, an outlier employee is bound to commit fraud, or collude with others to do so.
Measure and eliminate your debt.
Nearly all of the organizations I assisted this year had an outlier area of staggering technical debt.
This leads me to believe that companies that consider “debt” as part of engineering process are usually highly disciplined organizations, with lower risk.
Here’s why: A startup can move fast. They can cut corners. They can compete aggressively and take risks.
During development and a successful launch, a difference from one company to the next is how well they’ve documented the shortcuts and have had a “retrospective” on what their debt level has become.
Then they pay back their debts.
Rarely do I see a team eliminate all of their debt, but the organizations that at least respect their debt never get so far behind that they can no longer be helped in a breach.
Debt comes in many forms: scale, development speed, site reliability, customer churn, manual work before automation, and security.
The problem is that security debt is silent. Every other form of debt causes errors, customer complaints, expenses, and engineer rage. Security debt only results in breaches and is near impossible to quantify. It requires manual effort or a technology harness to surface security debt.
An engineering organization that has a mastery of its debt is rare as a company and, as a symptom, an easy-to-secure organization.
I’ve rarely seen this in practice, but the mere desire of this level of enlightened engineering is a great sign at a company. Google is a company that has structured its “error debt” around its release practices and has policies driven around it, and is one of the best examples I’ve found of making “debt” an objective problem to be measured and solved. Ollie Whitehouse @ NCC Group has also presented on this topic in the past.
Most engineering organizations don’t know that some of their basic processes (retrospectives, post mortem) are helping them avoid massive areas of debt.
Conclusion: Make sure your biggest debts are paid before moving along to another large endeavor.
We have the most to learn from our security incidents. It’s important to find ways to talk about them and learn from them. If you’re involved with incident response, I hope you can too!
I’m a security guy, former Facebook, Coinbase, and currently an advisor and consultant for a handful of startups. Incident Response and security team building is generally my thing, but I’m mostly all over the place.