Cybersecurity and Machine Learning: Risk Measures

6 min readJan 23, 2018

Several different things may leap to mind when one hears the terms “cybersecurity” and “machine learning” in the same sentence: intrusion detection, malware classification, and so on. Just to set the stage, this post is not about any of those thing, but rather preventative measures.

Discussions of cybersecurity are often loaded with complexity and technical jargon. This post aims to simplify the problem setting and unwind some of the jargon. Discussions about machine learning are also loaded with complexity and jargon; that part will be saved for a later post.

Vulnerabilities and exploits

A network intrusion is a scenario where an attacker gains remote access to a host on a network in a way that allows them to run system commands, extract files, access other hosts, or simply interfere with the system or services running. A “host” on a network is often a server, but could be a desktop station, a router, a video camera, or any imaginable device that can be connected to a network. Remote access is the important part: think of it as an interactive shell in which an adversary can carry out attacks; like an SSH shell, but not necessarily using the SSH protocol.

From the perspective of building a defensive security system, there are a few main ways that that an intrusion can happen:

Someone within an organization hands out credentials — usually unknowingly, such as in a phishing attack.
A misconfiguration, where the software is running correctly, but is accidentally set up to allow unintended access from a remote host. Good examples of this are default passwords on internet connected devices that are never changed.
A software vulnerability is exploited.

Estimates vary as to which type of these three vectors of attack is the most prominent or leads to the most damage. Experts agree that the first one presents the largest attack surface, circumscribed only by the limits of human error, gullibility, or malicious intent.

The focus here is the third item: vulnerabilities and their exploits. Simply put, a vulnerability is a flaw or bug in a piece of software that causes it to behave in a way that is unintended or unknown to the software’s authors. An exploit is a small piece of software or specially crafted input that leverages the vulnerability and triggers the unintended behavior. Not all vulnerabilities are created equal in terms of the degree of danger they present, and not all vulnerabilities are effectively exploitable in the same fashion. More details will follow shortly.

Focusing on vulnerabilities is a operationally tractable problem for two specific reasons. First, all vulnerabilities are essentially public. On a daily basis, new vulnerabilities are discovered and reported by software vendors or independent security researchers. The public disclosure of a vulnerability is typically coordinated with the near immediate release of a software patch or update that remediates the vulnerability. All public disclosures are aggregated in the National Vulnerability Database, which is updated daily. Secondly, there are many different open and commercial scanners that help IT teams identify and categorize the active vulnerabilities on all devices in a network range.

Prioritizing Remediation

A good scanner can determine where and what the problems are, and software patches — or other remediations — will make the problems go away. A healthy cycle of scan-and-remediate removes one of the three main vectors of network intrusions. Problem solved, right?

The problem with this reasoning is that, while a healthy cycle of scanning is straightforward to achieve, remediation often entails complexity on another order of magnitude. For instance, patches that impact the JVM, low-level system libraries, or the operating system often require a maintenance window for the entire stack of software running on those systems. For a organization’s mission critical services, this is something not undertaken lightly.

Another limiting factor to a simple scan-and-remediate cycle is the volume of data involved. For a small network range of, say, a thousand hosts or “assets,” a scanner may report a volume of vulnerabilities on the order of ten thousand. (Many of these are benign, or may be the same vulnerability repeated across different hosts.) Each of the categorized vulnerabilities needs to manually reviewed by a member of the IT team, and any remediation steps need to be addressed or scheduled. For a small IT team with many responsibilities in addition to patch maintenance, this is patently infeasible.

There is a way to prioritize allocation of resources in remediating vulnerabilities. As mentioned earlier, not all vulnerabilities are created equal, and not all are effectively exploitable. The standard Information Security risk model categorizes risk by the degree that it impacts confidentiality, integrity, and availability (CIA) of the underlying resource or data. For software vulnerabilities, the industry has largely adopted a deterministic scoring system, called the Common Vulnerability Scoring System (CVSS), that assigns a nominal risk score on the 0–10 scale to each vulnerability. It bears repeating that this is a deterministic formula reached by consensus of industry experts.

Other measures of risk

One of the seldom cited drawbacks of the CVSS score is that its formula only includes a few theoretical “exploitability metrics” with low/medium/high attributes, but does not take into consideration the empirical likelihood of an existing exploit. This last point is arguably the most important for establishing a realistic risk score.

When a security vulnerability is reported, its verification typically includes a proof of concept exploit. Not all of these are necessarily robust or reliable enough to be used “in the wild” to carry out a remote attack. A smaller subset of these which are reliable enough to be included in penetration testing tools will make their way from research channels into public repositories of exploits. Exploit-DB and Rapid7 are among the most active repositories.

It would be remiss to leave zero-day exploits unmentioned. These are exploits targeting a vulnerability that is not yet known to the vendor, or where a patch is not yet available. The reality is that zero-days constitute a much larger mindshare of the discussion around cybersecurity than their share of active exploits. The reasons are essentially economic — zero-days are prohibitively resource intensive to develop and have no guaranteed lifespan — making them a scarcity. From the perspective of an adversary, deploying a reliable public exploit is much more viable, especially if accomplished at a scale leveraging of the fact that some portion of the systems hit will be unpatched. Most malware campaigns that deploy “exploit kits” rely on mass drive-by delivery tactics.

To put all of these indicators of risk into perspective, let’s look at some numbers.

This reflects a snapshot of data from January 2018. The NVD database contains about 94k vulnerabilities that have a CVSS V2 score. Of those, almost 40 percent have a nominal risk evaluation that is either “High” or “Critical.” But when restricting to vulnerabilities that have publicly available exploit, the rate drops to around 2 percent.

The takeaway from second and third lines of the table reflect a common issue familiar to anyone responsible for operationalizing an IT security program: thresholds that are manually tuned, regardless of the underpinning expertise, still yield a volume of data exceeding what can be manually processed. (Think of the scenario where a scan yields ten thousand results, but only 1,400 are critical and need to be addressed immediately.)

The last line of table captures only a stochastic relationship — whether or not a vulnerability met the criterion to earn it a public exploit — but points toward a more principled approach of risk prioritization:

The right hand sided expresses the conditional probability that a vulnerability has an exploit. It is a number between 0 and 1, where P(Exploit|V) = 1 reflects absolute certainty that the condition is true, and P(Exploit|V) = 0 certainty that it is false.

Because this score is based on probability, decisions can be based on confidence intervals instead of manual thresholds. For instance, vulnerabilities with score at least 0.95 constitute the subset where the model is 95% confident that they have (or will have) a public exploit. Such a model would given an IT team finer grain control over how it prioritizes work, and one that better highlights what an adversary would regard as low hanging fruit.

Cybersecurity and Machine Learning: Risk Measures

Vulnerabilities and exploits

Prioritizing Remediation

Other measures of risk

Written by Jerry Gagelman