Endpoint Security: Intuition around the Mudge Disclosures
This is a valuable launching point for discussing the intuition behind endpoint security overall for those of us growing security programs.
The first is endpoint coverage. At issue for Mudge is how endpoint security measurements were shared with the board. Supposedly, the board received the percentage of devices with any endpoint security software rather than endpoints meeting some baseline. This is, of course, easier to hit, and they hit over
90% coverage with this inclusive measurement strategy.
Mudge claims in a contradiction that
40% are “not in compliance with basic security settings.” The question for ourselves and our teams is, what is our threshold and what is our baseline? And what could the target have been at Twitter, and what should it be for us? How would I reason about this?
We’d love for it to be
0%, but at what percentage would we release resources to focus on higher priority risks?
Next, we’re not given clear information if the Twitter organization even decided on any baseline. It’s unclear if this is his baseline or one agreed to by consensus as a policy. His assessment of non-compliant systems seems to include automatic updates disabled, disabled firewalls, and remote access enabled, among other things. His assessment may not be wrong, but we don’t see what the policy was.
What baseline matters for your organization, and where is that decided?
Leadership should optimally participate in these decisions. However, the Mudge disclosures do not illustrate a significant relationship between Mudge and senior executives.
So, we should ask how involved our leadership can be with risk decisions and how we can present them to pursue our own goals. Let’s talk about it with the Mudge disclosures in the rearview mirror and make our own organizations better.
Primarily, what endpoint security problem could you ask your board for a decision on?
First — let’s recognized that endpoint security has no solid definition in practice. I’ve seen this phrase used for the following things:
- ⭐️ Almost always, employee laptops.
- Sometimes, employees phones or assigned devices too.
- Rarely, all servers or compute.
- They are often used in other contexts, publicly accessible API routes.
This discussion is about employee devices (laptops and phones), and I will do my best to focus on principles instead of vendors.
Answering why we care about endpoint security involves discussing risk scenarios we try to manage. Here are some:
- A remote (or local) adversary has executed malicious code on an endpoint.
- An insider has abused their authorized access.
- An employee has lost their device.
- An adversary has stolen a device.
A whole constellation of endpoint security tooling exists that can be configured toward mitigating these risks.
Here are some of the features available in these products without getting into a spiral of marketing-isms.
- Detect, alert, and block known malicious code (or allowlist)
- Report on and make local host configurations that improve security.
- Allow remote forensic capabilities.
- Locate or destroy a lost device.
- Provide access to secure networks.
These sound great, so why don’t we stack up as much endpoint security as possible? Lots of reasons.
- They have overlapping and redundant value propositions.
- They vary in licensing models and price.
- You might need multiple, redundant solutions to support the platforms you care about (Mac / Win / Server(less))
- They may fall apart for common development scenarios, like laptops with lots of I/O for software builds.
- They fail when installed with other endpoint products.
- They are generally error-prone, crash, stop reporting, or do not work as listed.
- Engineers will forcibly remove endpoint security mitigations for whatever reasons.
- Employees will push back if they don’t trust the organization to maintain their privacy.
These sorts of troubles create fragmentation and reduced coverage.
Endpoint tooling creates lots of work. More experienced security teams are used to it, but we have to plan for these pain points from the start.
First, there is project work to bake off several products and find something that works with minimal system disruption and within budget.
Then there is project work to deploy it across a fleet to new future hosts and monitor it continuously.
The operational work to follow up on findings is universally underappreciated by almost every team going down this path.
Endpoint deployments often have a sprint to deal with the worst findings immediately (malware) but then fade off as the original deployment team ventures off to tackle new risks. Who picks up these operations?
Additionally, new projects are commonly proposed if you discover that fleet configurations are not up to par. A common discovery is that laptops are not encrypted like you’ve expected. Getting these and other misconfigurations under control is usually straightforward but arduous.
Staff and Infrastructure
Try to predict who will handle endpoint projects and operations. IT? Your team? You? Are they dedicated to this problem, or will there be a moment where they go back to another role? Larger companies often have a managed and staffed crew of endpoint engineering or client engineering that eats up all these problems. Until then, decisions need to be made. You’ll need a way to create managed devices with corporate tooling pre-installed for new hires, and you’ll generally be in a partner position with an IT organization while they own those decisions on how it will work.
Endpoint products are risks themselves and depend on more tooling. Centralized logging may be required to use the logs produced by multiple endpoint products. An SSO platform may be needed to protect the administrative panels provided by vendors to access data from them. If not right away, they’ll eventually become part of the whole endpoint ecosystem.
As you can see, a functional IT team separate from a security org is almost always a prerequisite for this tooling.
Measurement: The critical part.
High leverage discoveries and projects often occur after an endpoint deployment. Soon, things settle down. Some new, tough questions are asked, like:
What sort of endpoint coverage do we have?
The first problem is identifying the denominator, which might be impossible in a larger company. Employees can have multiple devices on multiple platforms, and there’s always a possibility that (many) devices can exist without a single endpoint agent reporting in. Different endpoint tooling will have different reports about what the universe looks like. M&As may put entire sub-companies in totally different worlds of observability. Network, VPN, and other SaaS tooling will disagree on counts too.
But, we’ll have lots of input to make informed estimations with data, so your denominator will probably be a forecasted, uncertain interval provided by the most informed person you can select. Something like:
5,000–5,500 (90% certain)
And that is ok. If they have reason to believe that more uncertainty exists, they will cast a wide interval — and you’d work to reduce it with a project.
The right attitude is 100% coverage with all endpoint tooling, but this is the wrong direction. Of course, many other priorities have little to do with endpoints that are carrying risk for you.
We need to create a threshold to allow our resources to focus elsewhere. Now, this threshold can be high if you are well resourced and in an industry that is risk averse. However, even you (yes, you!) need a point where you can focus on other risks.
There are two big decisions:
- What is the baseline configuration?
- What threshold becomes an incident if exceeded?
The baseline configuration should codify some strategy: Maybe you want to prioritize compliance, responsiveness, prevention, or both.
You’re either going to personally own baseline decisions, work with your industry, internal peers, or your leadership and board. Of course, more consensus is better if it is obtainable. The individuals involved should be informed of the risk scenarios (see the beginning of this essay).
Then you measure what’s objectively out there and compare it against your denominator.
The threshold is a matter of resources available. A higher threshold allows more room to breathe and focus on other priorities.
However, incidents that occur against endpoints that don’t meet the baseline have a higher probability of worse consequences. You can study this probabilistically.
- How many of the target risk scenarios occur annually?
- What’s the probability that the involved endpoint is out of compliance?
You can forecast the odds or gather internal incident data, or both. Your organization and industry peers could help argue whether this threshold needs to be higher or lower. For instance, how often employees have lost laptops in the past and previous incidents involving remote intrusions.
If the observable number is below this threshold, work on other stuff. Start opening projects to get things in order if the number gets close to the threshold. If it exceeds, then escalate as you’ve agreed on earlier.
Said differently, you can focus on optimizing other endpoint platforms so long as the coverage is under the 10% threshold. If it exceeds it, you have a standing agreement to pull the head of IT and the CEO into a meeting.
Of course, if leveraged opportunities come around to improve these figures, take them on. But remember, you’re working with a whole organization’s risks, not just endpoints.
My background is full of incident response work, and I can discuss how endpoint technology is helpful during an actual incident.
First, device encryption and remote wiping offer a massive sigh of relief in theft and loss scenarios.
Second, remote forensic and acquisition technology is rarely available but also proves invaluable in a crisis—a significant impact in speeding an incident. Having products where third-party vendors can assist with the incident response is a significant weight lifted off your shoulders.
Backups have a night and day effect on ransomware and wiper incidents and investigations into insiders or other general incidents.