Policy Assessment Using ML

Guru Prasad Natarajan
Mindboard
Published in
4 min readAug 30, 2019

Overview

In the 21st century, every person and organization, both public and private, are somehow connected. So, being able to quickly understand and efficiently analyze whether your third-party policy documents such as NIST 800–171, ISO 27001, ISO 9001, etc., meet the standards you set for them is critical to the success of your business. Current policy assessment tools are manual, inefficient, and don’t adequately reduce risk.

We at Mindboard developed a platform to solve these problems. We are utilizing machine-learning, semantic technology, a repository of standard-meeting model documents we provide the most advanced and efficient methodology for automating and evaluating policy documents.

Introduction

The scope of the project is to provide a document similarity scoring engine, which scores documents against the specific standards which allows the auditor to quickly and efficiently identify non-compliant policy documents.

Where surveys and document requests are prone to create “false positives” in their assessment of supplied policy documents due to an organization incorrectly saying they meet a standard or control, the scoring engine digitally analyzes policy documents against certain model documents that meet each standard on a control by control basis and provides a control by control and overall vendor score. Each organization can set its acceptance threshold for accepting or rejecting any or all policy documents.

In a nutshell, the policy assessment system automates third-party assessment due diligence by:

· Adding an artificial intelligence component to digitally analyze policy documents against model documents to ensure the organization meets each control of whatever assessment is being evaluated.

· Being able to utilize any type of assessment whether an ISO standards, NIST or any internal company standard.

· Comparing, analyzing and scoring each uploaded document at control by control level.

· Allowing the organization access to the model documents so they can identify where their policies fall short and improve them.

Approach

The core to the service is the taxonomy which is prepared from well-formed samples for each policy family. The below diagram shows the relationship between the Controls and Policies taxonomies:

Taxonomy generation

The policy taxonomy is the combination of both document level taxonomy and control level taxonomy.

Combined taxonomies

Scoring methodology

The current scoring process is based on the Policy taxonomy and in the future documents will be scored based on the control taxonomies alone to see how well it covers specific controls. We use a wide variety of preprocessing techniques and proven machine learning techniques such as cosine similarity, word mover’s distance to calculate how the documents under test deviate from their well-formed model documents’ taxonomies.

Displaying KENI (Key Elements Not Indexed) items

KENI is the list of items that represents the phrases that did not feature in the document that’s verified. KENI helps the organizations to tweak their documents to incorporate some of the key phrases to improve their scores.

Below is a diagram depicting the working of the service:

Solution overview

Conclusion and Future Work

Most of the organization require a quick and efficient solution to check for their policy compliance. Today, this review is being done manually and is a tedious process. So, this solution which scores organizations’ documents against custom-built model documents, taxonomies and algorithms and provides organizations with a document-by-document scorecard letting organizations know where it stands with regards to each policy.

In the future when more document samples are available for each policy type Mindboard will actively engage in replacing the existing technology with a state-of-the-art deep learning solution which will eradicate the use of taxonomies and directly derive the context from the trained model documents and provide an extra feature of classifying documents to particular policy type.

Masala.AI
The Mindboard Data Science Team explores cutting-edge technologies in innovative ways to provide original solutions, including the Masala.AI product line. Masala provides media content rating services such as vRate, a browser extension that detects and blocks mature content with custom sensitivity settings. The vRate browser extension is available for download via the Chrome Web Store. Check out www.masala.ai for more info.

--

--