Light Roast 112: How to Avoid Vulnerability Whack-a-Mole

Two approaches for evaluating real risk beyond CVSS score, and thoughts on automation, asset management, and defense-in-depth.

Published in

Dark Roast Security

12 min readFeb 10, 2022

Edvard Munch, The Scream, detail of lithograph, 1895. CC BY 4 The Munch Museum.

It’s 9pm on [insert your favorite holiday here] eve, and you’re having dinner with friends and family. Great food, great company… a bit of a sensory overload actually now that I’m thinking about this, but oh well. You’re feeling a nice buzz from the wine, and you have somehow managed to mentally block out the incessant yapping of your aunt (there, I fixed it). It’s all good now, nothing can harsh your mellow. And then BAM!

You get a call.

A new, pre-authentication, remote code execution (RCE), zero-day vulnerability has been discovered. It’s all over the news now, with headlines ranging from “This Is Bad…” to “The Internet Is Going To Melt!”

MITRE has given it a CVSS score of 9.8, Twitter and Reddit are exploding, your clients are bombarding your mailbox with questions, and your colleagues are asking for your guidance. What would you do?

SIDEBAR: Okay, all this might be a bit extreme, but it’s not that far from reality. The dust from Log4shell hasn’t settled yet after all, and a little exaggeration never killed anyone.

In situations like this, I generally try (let me reiterate, try) to pause and ask myself, “What Would Pete the Cat Do?”

SIDEBAR AGAIN: For those of you who aren’t millennial parents, Pete the Cat is, in my opinion, the coolest character in the history of children’s literature. He’s a chill skater dude who keeps running into all sorts of obstacles, like losing the buttons on his shirt or ruining his brand new shoes, but instead of freaking out, he goes “meh” and continues singing, or walking, or whatever it is hippie cartoon cats do.

Snippet from Pete the Cat and His Four Groovy Buttons, art by James Dean, story by Eric Litwin

I’ll tell you what Pete would do. He would take a deep breath, ask a few simple, yet important questions, put the situation into context, and make a decision. Well, actually I’m not sure what Pete would do, but would Pete cry? Goodness, no! Vulns come and vulns go!

Prioritize Your Vulnerability Response

Just like with everything else in life, when it gets to patching software vulnerabilities, prioritization is key.

According to Security Magazine, in 2020, an average of 50 new CVEs were disclosed per day! That’s plenty of vulnerabilities to go around for everyone. Even if you treated only 2–3 of these as fire drills on a monthly basis, you’d end up with a team that is burned out and numb to the e-mails from a boss who’s crying wolf 24x7.

When it gets to the prioritization of software patches, there are two approaches I usually take: a quick and dirty one, and a more methodical one. I believe both of these approaches can be valid and useful depending on your circumstances.

The Quick and Dirty Way — for your dinner table needs

In situations where you’re in a bind, don’t have the time or the tools to do a thorough investigation, and need to give an answer right away, simply ask yourself these three questions:

Is the CVSS score higher than 7.0 (high or critical)?
Is there a public exploit available?
Is the affected system Internet-facing?

If your answer to all three questions was ‘yes’, you must act immediately by either patching the vulnerable system or taking it offline. I would even take this a step further by saying that if it’s a pre-auth RCE vulnerability, your deadline was yesterday!

Sorry, dinner is ruined. That’s just how it is sometimes when you work in Cybersecurity. I hope you knew that before you got into this line of business ¯\_(ツ)_/¯. And yes, in this situation, it is okay to panic for a few minutes, maybe even scream a bit (but only on the inside).

If you answered ‘no’ to any of these questions, I would argue that it’s usually safe to let things sit until the next day and re-evaluate the situation more methodically. And that’s what we are going to discuss next.

The Elegant Way — the next day

Instead of re-inventing the wheel and coming up with yet another vulnerability and patch management matrix, I decided to review the existing body of knowledge on this subject and apply it to my personal experience from the field.

The guidelines that I reviewed were Guide to Enterprise Patch Management Planning by NIST, Patch Management Standard by CIS, The Top 7 Operational Technology Patch Management Best Practices by ISA, Assessing Security Vulnerabilities and Applying Patches by the Australian Government, and Recommended Practice for Patch Management of Control Systems by CISA.

From these, I found CISA’s approach to be the most practical one. The guideline was written with Industrial Control Systems in mind, but I would argue that it can be applied to anything running software; from a smart toaster oven to your fleet of web servers running Apache Struts <cough> Equifax </cough>.

In a nutshell, what CISA recommends is to:
a) use a diamond-shaped “vulnerability footprint” to establish the criticality of the vulnerability in your environment; then
b) use a “patch urgency decision tree” to determine whether to treat it as a fire drill or leave it for your patch management system to handle.

Here is a sample vulnerability footprint analysis that CISA has included in the document.

Example of a vulnerability footprint — source: “Recommended Practice for Patch Management of Control Systems” by CISA, Department of Homeland Security

It’s not too hard to understand: you do a qualitative analysis of the potential impact (DoS, RCE, XSS, etc.), attack surface exposure, simplicity of exploitation, and the deployment size in the organization. You assign these four properties values between low, medium, or high, connect them together, then take a step back and look at the resulting diamond shape. The larger the diamond, the higher the risk to the organization.

Once you’ve determined the size of your diamond, you would apply it as one of the inputs to the patch urgency decision tree depicted below.

Patch urgency decision tree — source: “Recommended Practice for Patch Management of Control Systems” by CISA, Department of Homeland Security

The images may be self-explanatory but I would highly recommend reading the original document since it also makes recommendations on patch testing, backups, incident response plans, disaster recovery, and other related topics. The full document is available here.

If you need more specific timetables for your immediate or routine patching schedule, I would recommend the guidelines provided by the Center for Internet Security (CIS) in this article.

“IT Standard: Patch Management”, by Center for Internet Security (CIS)

While CIS recommends deriving severity ratings directly from the CVSS score, this is something that I don’t necessarily agree with since it doesn’t take into account any of the other factors discussed earlier (i.e. exposure at your organization, exploitability, etc.)

My recommendation is to use the CISA model for decision making, and the CIS model for timeline, so it would look something like this:

Calculate severity based on Vulnerability Footprint Analysis by CISA.
Feed the severity rating into the Patch Urgency Decision Tree by CISA to determine whether to apply a workaround, patch immediately, or use routine scheduled patching.
If immediate patching is required, start patching within 24 hours and complete patching within 1 week (CIS severity: high).
If routine patching would suffice, start patching within 1 to 4 weeks, and complete patching within a 4-8 week time window (CIS severity: medium or low).

Of course, none of this is set in stone, and you can come up with your own patching schedule. My main goal here is to demonstrate the thought process that should go into the prioritization aspect of this problem.

Automate Your Patch Management

Automate, automate, automate!

The most important assumption with everything in the previous section was that you already have a reliable, fully automated patch management solution.

Most organizations have this all sorted out for the end-user devices. If you’re running a Windows shop and haven’t done this for your endpoints, stop reading this article and go set up WSUS (Windows Server Update Services), SCCM (System Center Configuration Manager), InTune, or some other patch management or RMM (Remote Monitoring and Management) tool instead.

Unfortunately, when it gets to servers and network equipment, a lot of organizations still do their patching manually. Yes, I understand that you might create a major outage by automatically updating your perimeter firewalls or the payroll database server on a Wednesday night, but hear me out.

Infrequent patching = higher potential for disruptions

A network firewall that hasn’t been patched in three years is prone to running into some major issues next time you patch it. If you’re a few major versions behind, now you would also have to deal with incremental upgrades which further increase the potential for issues and downtime.

A firewall that is being patched every other month may have a blip once or twice a year, but overall things should go without a hitch.

Routine patching is one of the best ways to test high availability

What better way to test the resilience of your systems and applications other than making sure you can patch them in the middle of the day (ok, maybe after hours) without anyone noticing?

Can’t patch the MS-SQL server because you have to take it down for an hour? Maybe you should have created an Always-on Availability Group.

Can’t touch the firewall with a ten-foot pole? Maybe you should have set up and tested HA (High Availability) between the firewall pair.

Windows file server can’t be offline? Try DFS-R (Distributed File System Replication).

Hosting applications in containers? Try Docker Swarm or Kubernetes.

I can go on and on with the examples, but we all know that a production-level application cannot have a single point of failure, and if you’re worried that your software updates can take a whole system down, then you have bigger problems.

You might argue that even when the system is resilient enough, the patch can still introduce bugs or logic errors that can spread across nodes over time. True, but that’s why you should always have a rollback plan and a “test, pilot, production” pipeline when patching your mission-critical applications.

And yes, you should automate that as well! I will delve deeper into this topic in a future post.

Users’ attitude toward software maintenance has changed

I believe that we, as a society, have become more tolerant toward minor inconveniences caused by security patches. This might sound strange at first, but let’s pause for a minute and try to remember the last time your iPhone or Windows laptop got an update. Did it just notify you once and leave you alone, or did it keep harassing you over and over again, and finally went cowboy and applied the patch anyway?

Security threats are on the headlines of major newspapers — and we’re not talking about Wired or PC Magazine. They’re on the New York Times and Washington Post too. As users become more aware of the implications of security threats and get used to their consumer electronics forcing updates on them, they’ll also get more tolerant toward minor downtime on their business systems and applications.

This is all coming from my first-hand experience working in IT for almost 20 years. It used to be that you’d have to schedule a maintenance window, communicate it weeks in advance, get approval, then apply the patches.

Nowadays, you can either:
a) patch at a predetermined maintenance window without any further communication; or
b) for emergency maintenance, send out the communication and move forward with patching, but give the user the opportunity to opt-out (and always get this in writing).

Embrace Your Inner Onion!

Ultimately, no matter how quickly you respond, or how robust a patch management solution you may have, you are still racing against the clock when it gets to major security vulnerabilities.

To make matters worse, with some zero-day attacks, you may not even know you’ve been exploited until months later when the vulnerability is discovered. A prime example of this was Sunburst, where the attackers were exfiltrating data for three months before the story broke out.

I hate to state the obvious, but time and again, this goes to show how important it is to take a defense-in-depth approach (aka the onion model) to cybersecurity.

Let’s take Sunburst again as an example.

Is there any reason why a Solarwinds network performance monitoring (NPM) or network configuration management (NCM) appliance needs to have access to the Internet?

Yes, maybe it needs to be able to reach out to the vendor’s website to download its updates (which in this case were contaminated). And yes, it would also need DNS (which in this case let the attackers send and receive command-and-control traffic using Domain Generation Algorithms (DGA) with a series of DNS queries and responses). But if we don’t allow all outbound traffic, at least we would be able to prevent data from being exfiltrated to unknown IP addresses.

The same goes for Domain Controllers, database servers, or really most servers. Do they need full outbound access to the Internet or do they just need to be able to talk to the OS vendor for updates? In fact, does any server need carte blanche outbound access to the Internet? Probably not.

How about your EDR? Are you getting alerts on suspicious process chains, the use of 7zip to create password-protected, compressed files, or things running from paths they don’t normally run from?

What about your SIEM or NTA? Do you have rules to alert you on large data transfer to IP addresses that are deemed unusual based on your ML rules?

I can keep going on and on about security controls that would have helped with these scenarios but we can do that on another post.

Maintain a Dynamic Inventory of Software Assets

I should have put this one on top, but I would have probably lost half of the audience. While asset management is not as cool and exciting as some of the other security controls, it is absolutely critical for all organizations. It is in fact the foundation on which all other technical security controls sit.

After all, if you don’t know what software assets you have in the environment, how can you tell if you’re vulnerable? Or even worse, how can you tell if you’re not vulnerable?

At a bare minimum, you need a tool or set of tools that can inventory your assets using active scans via WMI (Windows), SSH (Linux), and SNMP (network devices).
If you can add an agent-based solution that can get more details from the operating systems, even better.
If you can also add a passive scanning solution that can identify assets by sniffing network traffic, even better-er.
If you can (I know I’m really pushing it now) add an EDR or SIEM solution that also records all running processes, their file versions, and associated command-line arguments, on all endpoints, you’re golden!

It is worth mentioning that the last method, getting process command line parameters from EDR/SIEM, was actually how I was able to identify all hosts running vulnerable versions of Log4j within minutes because none of the other methods go that deep into software packages.

The conclusion: Yes, it is important to patch all vulnerabilities in your environment, and CVSS scoring provides great value in that regard, but that’s not enough on its own.

Vulnerabilities like Log4shell, Print Nightmare, Sunburst, Shellshock, and Heartbleed, all required immediate action, but we have to remember that the urgency was not just due to their high CVSS scores. They also pose real risks to most organizations. You can verify this but putting them in the two models above and looking at the results.

Ultimately, none of those vulnerabilities ended up causing the major global catastrophes we predicted, but that’s mainly because of all the hard work done by the IT and security community. It’s thanks to all the sleepless nights and holidays we spent banging on our keyboards and patching things before the bad guys/gals got to them.

But is this really the most effective and healthy way to deal with the problem?

Three things are certain: death, taxes, and security vulnerabilities.

While there is no way we can stop the flow of major vulnerabilities, my hope is that we can reduce the amount of time and effort we spend in this area, while maintaining a secure posture for our organizations and our communities. In order to do that, we need to cut some of the red tape, heavily automate our patching, and be more assertive with our users when it comes to protecting their own safety.

I hope this article helped you take another look at a few areas within your organization, identify the gaps, and make some improvements. I’m sure readers can think of some other methods that would help maintain a secure environment without treating each critical vulnerability as a fire drill, and I would be happy to hear your thoughts and suggestions in the comments section below.

Cheers!
Kam