Stop Buying Bad Security Prescriptions
You’re paying too much and it’s just not working
I’ve been working in information security for about two decades — spanning attack and defense, across the public and private sectors — and the most consistent truth I’ve found is that people overwhelmingly misunderstand how information security works. Even worse, the common misconceptions are such an endemic problem that they’ve fueled a $75 billion industry, comprised largely of snake oil solutions that range from ineffective to outright harmful. That’s left us in a place where the vast majority of the tech sector is throwing their money away on security that just doesn’t work, while ignoring the basic practices and processes that actually do produce secure systems … but it doesn’t have to be this way.
Good Security Is Holistic
First, it’s important to understand that security is an emergent property of your overall product or system health — it’s not an isolated feature that gets shipped and is then relegated to minor upkeep. Rather, similar to your own personal health, security requires ongoing maintenance and consistently applied proactive care. To pull an analogy from my elementary school reader:
The superior doctor prevents sickness. The mediocre doctor attends to impending sickness. The inferior doctor treats actual sickness.
While the value of preventative care and healthy lifestyle choices are well understood in medicine, the concepts are a bit fuzzier in the information security space. So, we need to establish a bit of common vocabulary before we can speak about them in more detail.
- Trust boundaries — the logical distinctions between greater and lesser privilege within a system. Ex: user accounts may be root (greater privilege) versus non-root (lesser privilege); parameterized SQL queries include executable statements (greater privilege) and parameterized data (lesser privilege).
- Attack surface — entry points that allow operations or data to cross from lesser to greater trust boundaries within a system — typically all public API surface, whether intentionally or accidentally exposed. (As a colleague of mine likes to say, “Your attack surface is just your real API surface”.) Ex: HTTP(S) endpoints; database connections; IPC interfaces; network transports ( particularly non-secure transports over untrusted networks).
- TCB (trusted computing base) — the set of all components of a system that are relied on to enforce (or simply not violate) a system’s security guarantees. When a system is strongly isolated by trust boundaries, the TCB may be split as sub-components across multiple system components. Ex: kernel; object permissions; stack canaries; ASLR/NX; safe language features.
- Vulnerability — a flaw in the design and/or implementation of the TCB such that an attacker can exploit the flaw to compromise the confidentiality, integrity, or availability of a system (accepting that the exact degree of damage caused by said compromise can vary significantly). Ex: memory corruption (such as a stack or heap buffer overflow) leading to attacker controlled code execution; privileged API surface exposed over a network or IPC channel.
- Defense in depth — layering independent trust boundaries, TCB mechanisms, and other mitigations such that attackers are limited in what damage they can cause by exploiting a single vulnerability or compromising a single component of a system. Most importantly, each independent mechanism must continue to provide its security function regardless of a compromise of other independent mechanisms. Ex: web browsers often: rely on JavaScript as a memory-safe language; employ ASLR+NX, stack canaries, and other memory corruption mitigations; and run web content in “sandboxed” processes with very restricted system access.
- Threat modelling — logically decomposing the security aspects of your system such that you can understand it from the perspective of stakeholders and attackers. This is a well documented process, so you might want to read up a bit.
Now that we have some common terminology, we can frame preventative security more clearly as proactive strategies for eliminating vulnerabilities, largely via the following approaches:
- Clearly establish and enforce trust boundaries.
- Identify and minimize attack surface.
- Simplify and reduce the TCB footprint.
- Layer protection mechanisms (defense in depth).
Of course, we can’t always prevent vulnerabilities, just like we can’t always prevent illness. So, we also have reactive approaches to mitigate or remediate damage after the fact. A typical set of strategies is:
- Find and fix implementation vulnerabilities (e.g. fuzzing and patching).
- Reduce the reliability of exploits in the wild (e.g. ASLR+NX).
- Reduce the number and viability of escalation pathways (e.g. SELinux).
- Detect and report compromises in the wild (e.g. anti-virus).
Putting it all together, we need to think about how proactive (preventative) and reactive security gets integrated into the lifecycle of a system. You can refer to the graphic below for some common examples of security mechanisms integrated over the different phases:
Okay, that was a lot of terminology and bullets to throw at you, so let’s take a step back and consider the bigger picture. First, the thing that should really jump out from the graphic is that the most effective security mechanisms are heavily biased to the proactive end of the spectrum, while the reactive end of the spectrum increasingly focuses on weaker mitigations and recovery mechanisms.
This should intuitively make sense, because just like the superior doctor prevents illness, in the field of security, the superior approach is to prevent vulnerabilities from being introduced in the first place. You do that by ensuring that security is an integral part of the design and implementation of your system, which allows you to approach security proactively by making intelligent decisions about exactly what functionality your system will expose and how it’s surfaced.
If you relegate security to the later stages, then you’re stuck in the role of the inferior doctor. Because you’re bound by earlier decisions that simply didn’t account for security, your treatment will be focused on mitigating the impact of existing vulnerabilities and triaging damage from attacks in the wild. In the worst — and sadly too common — case, you’ll get stuck in a reactive cycle of spending your entire security investment on treating late stage symptoms.
And just in case I’ve scared you into thinking you can forgo reactive security entirely, please allow me to correct that notion. No one is ever going to ship a system with ideal proactive security, either because of intentional tradeoffs or from bugs that will be introduced during every stage of development. That’s why you need reactive security as well, to provide both additional layers of defense in depth and feedback on attacks in the wild.
Remember, effective proactive security is what enables effective reactive security — otherwise you’re just left with intractable problems at the later stages.
Security Health Is Hard to Measure
The field of medicine has reasonably effective methods of assessing the efficacy of different treatments and overall health upkeep. Unfortunately, the information security field mostly lacks similar methods of assessment. This is true for a variety of reasons, but the big three are:
- The information security industry is in its infancy and wildly diverse. We have few established and evidence-backed norms for what constitutes reasonable and effective measures — and what we do have is still rapidly evolving.
- We can’t make apples-to-apples comparisons between different products or approaches, both because detailed security data is largely considered too sensitive to share and (per #1) we lack the consistent definitions necessary to meaningfully share anonymized, aggregate data.
- Attackers cover their tracks and avoid discovery, so it’s entirely possible (and arguably the norm) to simply be unaware of compromises or actively abused vulnerabilities. In fact, it’s not uncommon for systems to be thoroughly compromised for years at a time, but their stakeholders are blissfully unaware that anything is wrong.
Our inability to make qualitative conclusions about security health has driven many to focus on divining false conclusions from what little data we do have. And unfortunately, the available data is heavily biased to the most reactive side of the process, meaning the metrics we’re seeing are largely a measure of degrees of failure. Worse, security vendors often try to imply far more meaning than appropriate for the data they have, arguing that arbitrary counts of CVEs or malware samples is sufficient evidence for quantitative claims against the entire unknown space of all real-world threats (for CVEs specifically, it’s a problem I’ve long complained about).
Sadly, that approach is simply not useful or honest, because you can’t reasonably compare a finite and arbitrarily biased sample set against an infinite set of unknown scope. As a concrete example, consider that even the most naive attackers ensure that their code can evade popular anti-virus/anti-malware software. Simply put, the entire anti-virus/anti-malware industry is by necessity always behind the curve of the threats that they’re supposed to be defending against. (Back in 2005 Marcus Ranum called this the #2 dumbest idea in computer security, but it still has amazing staying power.)
In practice, you will be far better served by relying on qualitative assessments that are directly applicable to your own systems. You need to really understand your threat model, which means having a clear picture of your system’s TCB footprint, attack surface, and what mechanisms you’re relying on for defense in depth — such that you understand the chain of vulnerabilities needed for an attacker to effect a significant compromise. In terms of quantitative metrics, you’re best served by tracking things like: fuzzer hits, security report volume, turnaround time on security fixes, and other numbers that speak to the overall health of your system.
Once you have a good understanding of the security properties of your system, you will see common vulnerability patterns emerge (e.g. a particular data format handler is prone to heap corruption). These are the kinds of signals you want, because they will inform proactive changes to either harden your implementation or alter the design of your system to reduce attack surface and eliminate entire classes of vulnerabilities.
Miracle Cures Don’t Work
Effective security measures need to be integrated deeply into the development of a system, but third-party security products by necessity target the most reactive end of the spectrum. That makes sense when you consider that the companies selling security products aren’t involved with the design or implementation of your systems, so they simply lack the context or opportunity to really build effective solutions. However, the practical effect here is that the industry is mostly relegated to providing ineffective, bolt-on products.
Of course, the situation is actually quite a bit worse than just being ineffective. That’s because the market pressures security products to compete on feature lists centered on arbitrarily chosen threats. Ironically, all of those “features” are actually bleeding vast attack surface across your entire system. And since security product vendors are almost universally negligent with respect to making their own products secure, they leave their users more vulnerable to all of the attack and escalation vectors they introduce. And that’s not even considering the negative impacts on performance and stability that these products typically introduce.
The market is just structured such that security vendors are largely incentivized to sell products that make your systems fundamentally less secure.
However, the larger problem here is not unique to third-party security products. Internal security teams often fall into a similar trap when they cannot work effectively with their engineering and operational peers. This typically means security gets involved much too late in the process — often in an adversarial role — and is stuck in the trap of solely reacting in response to reports of external threats. In fact, I’ve seen numerous internal security teams grow increasingly bloated and ineffective, as they demand more and more resources to react to threats that they’re just fundamentally mishandling.
Unfortunately, the security posture rarely improves in these situations — to the contrary, it may further deteriorate as a direct result of the security team’s efforts — because their actions are born from insufficient understanding of the system and lack of agency within their organization to effect meaningful change. So, the security investments end up wasted on treating (or worse, mis-treating) symptoms, rather than proactively addressing the root problems at a much lower cost over the long term.
Build Security into Your Process
The important thing to understand is that building a secure system is a lot like living a healthy lifestyle. It’s an ongoing process of conscious decisions, which means your security process must be deeply integrated over the lifecycle of your system. To accomplish this, your security team needs to be collaborative peers to the engineering and operational teams — ideally embedded as active contributors to the development and maintenance of your systems. Here are ten major areas where you’ll want to ensure security is working closely to ensure overall system health:
- Design process — Organically evolved systems are among the most expensive and painful to secure — assuming it’s even possible. So, you need a clear design process where security can provide feedback, propose alternative approaches, and identify (and ideally prevent) the potential for security issues in later stages.
- User Experience — Security UX is often relegated to an afterthought — if it’s thought about at all. However, a good security UX is essential to ensuring users make responsible trust decisions and avoid dangerous risks. Broadly, this involves avoiding unnecessary decisions by providing secure defaults, and ensuring clear information is provided at the correct time and in the context of a necessary trust decision.
- Development practices — Inconsistent process is a very strong signal of both poor security and poor code quality, because it usually means the development team lacks broader understanding, communication, and clear ownership of their code. Even small things like inconsistent style and formatting policy can obscure subtle bugs and dangerous patterns. (Apple’s “goto fail” bug is a great example of how consistently enforced coding style would have prevented a particularly bad vulnerability.)
- Change/code reviews — Any project of significant scope needs clear ownership of individual components, and proper review of all changes/check-ins (beyond the simple mechanical stuff). Security should just tie into this process directly, reviewing any changes to security sensitive components, or providing security expertise where appropriate.
- Dependencies — It’s critical to maintain a solid understanding of the security impact of any libraries or systems that you depend on. That means understanding any attack surface or TCB increase that such a dependency might introduce, along with the overall security quality of dependencies. And be especially considerate when adding dependencies, because they put you on the hook for tracking their security updates and deploying them as necessary.
- Testing — Any changes to the codebase need proper test coverage (unit, system, and integration, as appropriate) ideally run as part of a continuous integration process. For TCB components, this simply guarantees that things are working as intended (e.g. “goto fail” should have been prevented by tests). However, the larger value is that robust tests provide freedom to make significant changes, because they’re the only way to be confident that you haven’t broken anything when, say, a security fix requires a major refactor. And of course, security-focused fuzz-testing is essential whenever handling things like file format parsing in memory-unsafe languages.
- Deployment — If you cannot reliably and safely deliver up-to-date software, then you cannot provide a secure product or maintain a secure system. That’s because security is always a moving target, requiring that updates be delivered in response to discovered vulnerabilities or for necessary security improvements. Simply put, an out-of-date system — or worse, a system that cannot be reliably updated — is a fundamentally unsafe system. (Duo Labs’ recent work covers a range of real-world failures here, particularly on the “safely deliver” front.)
- Telemetry and metrics — Data from the field is extremely helpful in informing security decisions. It can be used to: assess the effectiveness of security UX, identify low-usage features for removal, expose potentially exploitable crashes not yet exercised by tests, or even detect abuses and compromises in the wild. Unfortunately, most security investment today is in that last point (detection in the wild), which is arguably the least effective thing to do at this stage.
- Bug triage — Security vulnerability reports are just a specific kind of bug report, but they generally warrant a faster turnaround and confidential handling. So, your bug triage process needs to account for this by assessing the impact of vulnerabilities and prioritizing as appropriate. If you’re confident in the maturity of your process, you may want to further incentivize security reports via a bounty program.
- Deprecations — Sometimes security issues can be fixed only by breaking compatibility for some existing use cases. So, it’s essential you have a clear policy and timelines for deprecations. This includes establishing criteria for when short deprecation timelines are required to address severe security issues, versus longer timelines for general security enhancements and improvements.
The TL;DR
Okay, I’ve thrown a lot of information at you here, and maybe it doesn’t all seem to fit together yet. Sorry, security is a big complicated problem and I’m trying to hit the high points without writing a whole new book on the topic (a not-so-humble brag). So, at the risk of being repetitious, I want to reiterate a few key points.
Security is far more cost effective if you start early and incorporate it into the lifecycle of your system. That means building out threat models as soon as possible, then designing, implementing, deploying, and maintaining your systems in line with the threats you reasonably anticipate. It also means watching how the threats evolve and updating both your understanding and your systems as appropriate—because security debt is technical debt, and the more you put it off the more it’s going to cost you. You don’t want to fall into the trap of relegating security to late-stage signoff or a reactive cleanup crew, because it just consigns you to an endless cycle of expensive failure.
Prefer simple, proven, integrated solutions over complex, trendy, bolt-on technologies. Complexity is the bane of security, because it introduces failure conditions and attack surface that just make the job harder. So, don’t fall for the latest marketing spiel on Machine Intelligence Driven Heuristics for Advanced Persistent Threat Elimination Magic™. Instead, favor a deep integration of basics focused on: attack surface reduction, trust boundary isolation (e.g. network or process separation), consistent and safe input handling, and proven technologies for handling security-focused capabilities such as authentication or encryption.
Reliable updates are the basic foundation of any security effort. If you can do nothing else, then at least make sure you’re deploying updates in a reliable and timely manner. That means tracking security relevant changes in your software and dependencies, and having a reliable infrastructure to push updates over a trusted channel. That’s far from a real security strategy on its own, but without it you will lack any capacity to react to threats or to deliver any security improvements.