Natural Language to Risk Score for CIS Benchmarks using Deep Neural Networks

Fatih Bulut
IBM Cloud
Published in
5 min readNov 20, 2019

--

Risk management is a fundamental part of Cloud Service Management. Understanding up-to-date risk posture of the cloud environment is a desiring feature of today’s complex IT infrastructure.

In today’s IT world, different security processes are used to make sure that the cloud environment is safe, secure and compliant. Patch Management and Health Checks are two major examples of such compliance processes. Quantifying the risk for these compliance processes are critical in order to understand the current risk posture of the IT environment. While quantifying the risk in a certain compliance domain is well-defined and standardized, it is yet to become a global rule for different domains due to lack of the standard process.

This blog post explains how the AI-based approach can standardize different domains to quantify risk using natural language processing (NLP). In particular, Center for Internet Security (CIS) Benchmarks are used for the case study and applied in the IBM product, IBM Cloud Pak for Multicloud Management.

Background: Common Vulnerability Exposures

Since the year 1998, the systems and software vulnerabilities have been collected in one place called National Vulnerability Database (NVD), hosted by National Institute of Standards and Technology (NIST). Software vendors release software patches based on the found vulnerabilities. Patches often include one or multiple Common Vulnerability Exposures (CVE) assigned to them. Each CVE is assigned a vulnerability score in the range of 0–10 based on Common Vulnerability Scoring System (CVSS) by a security analyst. The figure below shows an example of a vulnerability description and assigned scores based on CVSS.

Example vulnerability description and assigned scores.

Patch management is one example of the security processes in today’s Cloud Service Management. Another process is Health Checking. Health Checks are usually performed by standards. Center for Internet Security (CIS) Benchmarks are an example of such standards that are widely used today. CIS benchmarks aim to make sure that managed systems are secure, safe and compliant. However, today there is no standard way of assigning a risk score for each failed health check similar to each missing patch. Below we explain our approach to fill the gap in this domain using AI.

Classification with Deep Neural Networks

Deep Learning is considered to be a subfield of machine learning, which itself is a subfield of Artificial Intelligence. Deep Learning (a.k.a Deep Neural Networks) has shown to be an effective method of machine learning when there is large amount of data for training. In various domains and applications that use different type of data including text and image, Deep Learning has proven to overachieve the traditional machine learning techniques. Moreover, the recent so called “ImageNet Moment” of NLP with new and novel types of language modeling techniques has paved the way for better text (and language) understanding. In our case, we are particularly interested in classification of a given description of text into a Common Vulnerability Scoring System (CVSS).

In the following section, first we explain the data that we used in our AI model, second we explain the model that we developed, and third we explain how we involve human experts in the loop.

Data

Our primary source of data are the vulnerability descriptions and their associated CVSS scores from National Vulnerability Database (NVD). As of writing this blog post, there are more than 123K vulnerabilities reported in NVD as of Nov 2019. This constitutes a considerable amount of data. Deep Learning has become a defacto method of machine learning when there is enough labeled data. So we saw an opportunity to utilize the state-of-the-art Deep Learning techniques to assign score for a given description of a vulnerability. Our second source of data is IBM Technical Specification Standard Documents. These documents are very similar to CIS Benchmarks and constitute the knowledge and experience IBM accumulated over the years when managing customer’s IT environment.

Model

As stated earlier, our approach is based on Deep Learning. The figure below shows the high-level overview of the AI model architecture that we are using today. The first layer is the input layer, where given a text we generate an embedding out of it. We use state-of-the-art embedding techniques (including language modeling) to achieve the best performance. Next, we have hidden layers, where in this layer we use CNN, LSTM and other state-of-the-art sequence-to-sequence models. Finally, we have the output layer, where as can be seen from the figure, the layer tries to capture the different dimensions of the CVSS. Our approach is to build an initial model with NVD data (Vanilla NVD Model), and later use transfer learning with the data from IBM Technical Specification Standard Documents and CIS Benchmarks.

High level overview of the approach

Feedback

AI models can learn, and capture important features and output information we would want to receive. However, AI models usually impose limitations to achieve 100% accuracy. The precision is really important for mission critical models that have higher impact when it goes wrong. Hence, we developed a UI where Subject Matter Experts (SMEs) can easily verify (or change) the mapping assigned by the AI model. The figure below shows a screenshot from the feedback UI. As can be seen from the UI, a given description can transformed into a CVSS score with accumulated and separate confidences for each level of CVSS. This would help SMEs to be able to assign CVSS dimensions.

AI-assisted user interface to help Subject Matter Experts to give feedback

IBM Cloud Pak for Multicloud Management

IBM Cloud Pak for Multicloud Management, running on Red Hat OpenShift, provides consistent visibility, governance, and automation from on premises to the edge. Enterprises gain capabilities such as multicluster management, event management, application management and infrastructure management. Enterprises can use this IBM Cloud Pak to help increase operational efficiency that is driven by intelligent data, analysis, and predictive golden signals, and gain built-in support for their compliance management. More information on IBM Cloud Pak for Multicloud Management can be found on this link.

CIS Policy Controller and Risk

IBM Cloud Pak for Multicloud Management provides a policy framework to create custom policy controllers via Kubernetes CustomResourceDefinitions (CRD). CIS Policy Controller is one of the policy controllers. CIS Policy controller implements CIS Kubernetes Benchmark 1.4.0. Each control in this benchmark comes with a score that is quantified using the AI risk framework explained above. More information can be found in the official documentation of IBM Cloud Pak for Multicloud Management.

Conclusion

We have shown how AI based risk quantification can be done in a standard way for a given natural language description. Today in IBM Cloud Pak for Multicloud Management, one of the policy controllers is CIS Policy Controller, and each failed CIS Kubernetes Benchmark 1.4.0 check comes with a risk score assigned by our AI model and verified by a subject matter expert.

Authors

Muhammed Fatih Bulut, Senior Research Engineer, IBM Research

Jinho Hwang, Research Staff Member, IBM Research

Milton Hernandez, Distinguished Engineer, IBM Research

Special thanks to Suresh Kumar, Jaya Ramanathan and Ali Kanso from IBM for their collaboration on this work.

--

--

Fatih Bulut
IBM Cloud

Principal Applied Scientist at Microsoft. Interested in Security, AI and Cloud. Opinions are my own. Past: IBM Research, NYU, Columbia