How Tenable creates Exposure Scores for Identities

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

9 min readJul 2, 2024

The Tenable One Exposure Management Platform revolutionizes cybersecurity by providing a prioritized view of an organization’s attack surface.

Imagine having a crystal-clear view of your organization’s cyber battleground, with every potential vulnerability pinpointed and prioritized by seamlessly integrating vulnerability data from both on-prem & cloud devices, and adding the crucial element of user personas associated with these devices including their titles, access rights and habits. This is how you can say goodbye to your blind spots and say hello to proactive defense.

It is no secret that security professionals have more threats to deal with than they have time on their hands. Patching existing machines, scanning for new vulnerabilities, ensuring MFA is enabled, managing over privileged users, and the list goes on… These professionals need effective tools to assess and prioritize their work and Tenable delivers an objective view of the risk associated with each asset via the Asset Exposure Score.

The Asset Exposure Score (also known as AES) is a measure of the relative exposure of an asset. An asset may refer to various entities such as a device, a web application, a cloud resource, or, in the context of this article, an identity. It is another tool in the arsenal of our customers to help them prioritize their remediation efforts and it is powered by our data platform built on Snowflake.

This is a 2 part blog series:

In this first blog, we’ll walk through the design process for measuring the exposure of an asset, giving insights into the reasoning behind the decisions taken and share some implementation details.
In part 2 of this series, we’ll dive into how the scores are computed in our data platform powered by Snowflake.

Let’s delve into how this groundbreaking platform is rewriting the rules of cybersecurity for thousands of organisations by productionizing advanced machine learning models at scale!

Table of Contents

What is the Asset Exposure Score?
Entitlement Component
Hierarchy Component
Combining the Components

What is the Asset Exposure Score?

First things first, what is an asset? An asset is an abstract representation of a real world object, for example Tenable Vulnerability Management mainly captures computing assets (a laptop or ec2 instance) while Tenable Identity Exposure is focused on account and identity assets (a person or their account with some provider).

Regardless of the source or type, we look at intrinsic properties of the asset and its vulnerabilities to calculate its risk score. The properties of the asset make up its Asset Criticality Rating (ACR) while the vulnerabilities compose the Vulnerability Density score.

Putting the two together, we define the AES as the geometric mean of ACR and Vulnerability Density.

The rest of the blog will dive into the details of quantifying the ACR for identity assets. Note that we’ll refer to identity and account assets interchangeably throughout the text.

ACR: The Asset Criticality Rating is a measure of an individual’s importance (from an identity point of view) within an organization. It is calculated based on properties of the asset, and can change over time. It is defined for identities as a function of two components: the user’s position in the organization’s hierarchy (hierarchy component) and the level of access it has (entitlement component).

Entitlement Component

The entitlement component captures the level of access that an account has over other assets available in the environment. Accounts with high levels of privileges tend to have control over many resources in the environment as well as being able to perform more critical actions such as update, delete or change existing objects.

Assumptions

We make the following assumptions about entitlements:

Each entitlement has an action (e.g. read, write, manage, update)
There exists some ordering of entitlement “severity”, for example:
write >= read
This ordering may be fuzzy (for example, the relationship between update and manage isn’t as clear)
Common entitlements (shared across many users) have low “severity” (e.g. read)
High “severity” entitlements are uncommon (low frequency) (e.g. enableDirectoryFeature, managePasswordSingleSignOnCredentials)
The more entitlements a user has, the higher their score should be

Defining the entitlement score

Given the last assumption (more entitlements -> higher score) we can think of the entitlement score as a linear model and since we have different levels of entitlements, this will be a weighted linear model. Finally, since there may be some similarity in the ‘severity’ of an entitlement, we can group these entitlements and give them the same weight. Putting this together we get the following:

Where

C is the set of clusters of entitlements
β_k is the cluster coefficient for cluster k
n_k is the number of entitlements in cluster k for a given user
t_k is the number of unique entitlements in cluster k

One additional requirement is that scores have an upper and lower bound. To achieve this, we pass the score above through the following function to normalize scores between 0 and 1.

There’s a lot in there, let’s break it down. In the next section we’ll dive into how the clusters and coefficients are derived.

Generating clusters

We use clustering to group entitlements with similar actions. For example, each branch of the same colour in the dendrogram below consists of a cluster. Each cluster groups actions of similar severity together.

Code to generate the dendrogram above, given the embeddings of the actions.

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram
from sklearn.cluster import AgglomerativeClustering
import numpy as np

# Function below adapted from: https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html
def plot_dendrogram(model, **kwargs):
"""Create linkage matrix and then plot the dendrogram"""

    # create the counts of samples under each node
     counts = np.zeros(model.children_.shape[0])
     n_samples = len(model.labels_)
     for i, merge in enumerate(model.children_):
         current_count = 0
        for child_idx in merge:
            if child_idx < n_samples:
                current_count += 1  # leaf node
            else:
                current_count += counts[child_idx - n_samples]
        counts[i] = current_count

    linkage_matrix = np.column_stack(
        [model.children_, model.distances_, counts]
    ).astype(float)

    return dendrogram(linkage_matrix, **kwargs)

# setting distance_threshold=0 ensures we compute the full tree.
cluster_model = AgglomerativeClustering(
    distance_threshold=0, n_clusters=None, linkage="ward"
)
cluster_model.fit(embeddings)

dendo_dict = plot_dendrogram(cluster_model, labels=actions, orientation="right")

To produce the clusters, we generate embeddings of each action available in our dataset, and use that as input features to a series of clustering algorithms to determine (a) the best clustering for this data and (b) the optimal number of clusters (defined as the value that maximizes the silhouette score).

In Snowflake, this is all done within a stored procedure that calls a python UDF. Simple and efficient! Make sure to read blog 2 for more on this.

Assigning cluster coefficients

The next step is to assign a coefficient to each cluster. The coefficients should reflect the relative importance of the cluster within an environment. After experimenting with various options, we found that calculating β for each cluster as the total number of entitlements over the number of accounts with such entitlements to yield the best results.

At this stage, we have assigned each action (and entitlement) to a cluster and calculated the coefficients for each cluster. The next step is to count for each user:

The number of entitlements in each cluster (n_k)
The number of unique entitlements in the cluster (t_k)

Putting it all together we compute the Linear Component for each user and finally compute the Entitlement Score.

Hierarchy Component

Simply put, it is a component that calculates the criticality of a user, given its job title and the placement of the user in the company’s hierarchy.

While access level should carry most of the weight of the asset criticality, there are other aspects of an identity that also carry risk but are not easy to quantify. For example the type of information that may be accessible (e.g. a CFO is likely to have more confidential data than a developer on their machine). We use the hierarchy as a proxy for this.

The diagram below shows the flow for computing the hierarchy score. Each part is explained in more detail below.

Diagram with steps to compute the Hierarchy Component of ACR

Generating Job title and salary scores

At a high level, these are generated by taking an input job title (e.g. Data Scientist), computing its embedding, finding the five closest embeddings to this job title and taking their average score.

We use Sentence Transformer to generate embeddings for a given title and compare it to a reference set of embeddings. We load the model and the embeddings from a stage, but we’ll dive into the details in part 2.

User, manager and department scores

Using the scores obtained from the previous step, the user score u is computed as the product between the job title and job salary score. The manager score for user x and manager y is the user score obtained for the manager (y). The department score is defined as the mean of user scores (u) for all users who are subordinates (if any).

At this stage, each user should have a user score u, a manager score m and a department score d. These are combined into the hierarchy score using a weighted sum

which is then scaled to the range [0.1, 1].

Combining the components

We combine the hierarchy component (h) with the entitlement score (e) to generate the final ACR using the following function:

The weights were chosen such that the entitlement component has more influence than the hierarchy component. Since both components are in the range [0, 1], the final result is in the same range, clipping the scores to 0.1 in order to prevent zeroes.

Intuition

To combine the rule based and hierarchy scores in a meaningful way, we designed a heat map that highlighted the desired ACR severity given the severity level of the components.

The intuition is that if a user has a high entitlement component, its ACR should be high, regardless of the job title.

For example, we would like for interns with admin access to have a high ACR, even if their job title score is low. On the other hand, we don’t expect that users with high job-title scores have high privileges (it is not common to have a CEO with many critical entitlements). Below is a worked example for a highly privileged user with job title “Customer Markets Analyst”

The entitlement score e falls into the critical category, while is the hierarchy score h gets a medium categorisation. The resulting ACR combines these two and lands on the high severity category.

The resulting distribution of ACR scores is right-skewed, i.e. only a small percentage of assets have a high or critical score. The histogram below shows the distribution on a random sample of 20,000 assets.

When you have 20,000 assets to manage it can be daunting to know even where to start. That’s where the ACR comes in — it helps our customers better understand where their risk lies and prioritize their efforts accordingly.

Knowing what your most important assets are is only half of the story. As we saw above, the AES is the product of ACR and the Vuln Density score. That however, that is a story for another day. For now stay tuned for part 2 of this post which will dive into the algorithm implementation in Snowflake.