Business-friendly vulnerability management metrics

Uber Privacy & Security
Uber Privacy & Security
7 min readMay 10, 2021

Serge Pastukhov & Martin Georgiev, Security Engineering

Abstract

Understanding the health of the vulnerability management program is the key to managing risk in the company. Metrics need to be able to address the needs of various audiences — engineering teams, security leadership, executive leadership.

Vulnerability scanners are used by many companies to identify vulnerabilities on company assets. Usually these scanners provide you with the metrics that look like time-series charts of the total number of vulnerabilities over time, or total risk over time or something similar.

Above is an example of such a chart with the # of assets fluctuating around 10K and risk being some large value that is specific to the scanner.

We often find such metrics confusing, not reusable/aggregatable between different scanners (because they use different risk definitions), not being at a high enough level to please different audiences.

Vulnerability management is a process that includes vulnerability discovery via scans or some other method, mitigation activities if needed, and a subsequent remediation usually consisting of applying a vendor-provided patch for a vulnerable component. Many vulnerability management standards/policies define guidelines or even requirements on target mitigation/remediation times. Those usually vary for different vulnerability severities, e.g. a vulnerability of CRITICAL severity should be remediated within 7 days, HIGH severity — 30 days, and so on. Severities can be determined by a company itself or via using standard frameworks like CVSSv3.

In this post we discuss a methodology and metrics that allows tracking the health of the vulnerability management process in your organization. It allows you to take into account specifics of your environment, target remediation times applicable to your organization and many other aspects of vulnerability management.

Vulnerability management over time

We believe that vulnerability management is a process and should be measured as such, i.e. over time and not as a single point in time. Knowing how many vulnerabilities exist in your environment right now is very important for tactical decisions but doesn’t provide you the big picture. Similarly to service availability, vulnerability exposure (or better lack of it) can be calculated over some period of time (weekly, monthly, quarterly) using the same ideas — we need to measure what percentage of that time period our assets were not vulnerable.

For the sake of further discussion let’s assume that we are measuring our metrics over a 30 day period.

For service availability measurement we would assume that we have 1 service that is up and running for a month, so if it is down for 1 day during that 30 day period, then availability becomes 1–1/30=29/30=96.7%. The denominator here is the number of days that a service should have been up and running, i.e. 30 service-days. Similarly, for our vulnerability management metric the denominator could be the number of days in our period multiplied by the number of assets we have, e.g. if we have a 1000 servers, then over 30 days it will be 30,000 asset-days. We will call this asset capacity (AC) going forward. Vulnerable asset capacity (VAC) is the total number of days your assets are vulnerable. Once we have these two numbers, we can calculate the health of the vulnerability management:

Assumptions we will use in this article going forward is that asset information is coming from some kind of asset inventory management system and vulnerability data is provided by vulnerability scanners. There is not much to it when it comes to the Asset Capacity — if an asset is in scope of your program then every day it contributes to the total AC value.

Vulnerable Asset Capacity on the other hand is more tricky. We will start with the very naive approach and progress to more advanced ways of calculating it below.

Naive approach

Let’s assume that we have 5 servers and they have been in our fleet the whole month (30 days). On the 15th day a vulnerability was detected on 2 of them (server 2 and server 5), then it has been remediated 5 days later:

The easiest way to calculate the VAC is just to count all the days that servers were vulnerable — 10 in this case.

The generic formula for this case is

where N is the number of servers.

Multiple vulnerabilities on an asset

The naive approach above doesn’t take into account that there could be multiple vulnerabilities detected on an asset. An asset with 10 vulnerabilities could pose more security risk than an asset with just 1 vulnerability. The easiest way is to treat an asset with multiple vulnerabilities as multiple virtual assets each with only one vulnerability. After this a naive approach above can be applied.

Building upon the 5 servers example from the naive approach, let’s assume that there were 3 vulnerabilities detected on one of the servers and then they were remediated a few days later:

As you can see, we “exploded” a server into 3 virtual servers with 1 vulnerability each. After applying the naive approach:

The updated formula now looks like below:

Target remediation times

Vulnerability management standards and policies usually allow for some time to remediate detected vulnerabilities. So far our methodology would deem an asset vulnerable as soon as a vulnerability is detected. Unless your remediation is proactive you will always see VMH less than 100%. To give us some time to remediate a vulnerability without negatively affecting the VMH value, we could say that a vulnerability only starts contributing to Vulnerable Asset Capacity if it has not been remediated within the target remediation period.

Let’s build upon the previous example of 5 servers and say that our target remediation time for every vulnerability is 2 days:

Orange color above means that the server is vulnerable but still within the target remediation time window.

Now the first two days of a vulnerability present on an asset don’t contribute to VAC anymore. Our VMH calculated using this approach becomes:

Intuitively if you manage to remediate all the vulnerabilities within their target remediation times then VMH will be 100% meaning your vulnerability management program execution is excellent.

The updated VMH formula is now this:

Vulnerabilities are not all equal

Some vulnerabilities are more serious than others and have a larger impact if successfully exploited. There are a few ways to address this fact by adjusting the way we calculate the VMH value. Firstly, your vulnerability management program likely defines different target remediation times for different severities. More severe vulnerabilities will have smaller values for target remediation times, so they will start contributing to VAC faster than vulnerabilities of low severity.

Another (not mutually exclusive) way to account for severity is to only adjust the contribution of a vulnerability to VAC depending on the vulnerability severity. If a vulnerability is still not remediated after its target remediation time then we can say that only 95% goes to VAC. The adjustment factor can be defined by vulnerability severity or by using a modified vulnerability CVSSv3 score. CVSSv3 score divided by 10 will be in the range of [0;1] and can be used to decide how to modify an impact to VAC.

Let’s imagine that Server 3 vulnerabilities from the example above have CVSSv3 scores of 9.5, 8.8 and 7.1 respectively:

Applying all the rules discussed above to this example, our VMH is now:

The final formula looks like this:

Conclusion

In this post we described a methodology and metric to track the health of vulnerability management program in your organization. We believe the methodology to be very extensible and you should be able to apply your own approaches to account for multiple vulnerabilities on the same asset, vulnerabilities having different severities and other parameters discussed in this post.

--

--