Data Protection Rule in Watson Knowledge Catalog

Arnajdas
IBM Data Science in Practice
7 min readMar 6, 2021

(Co-authored by Arnaj Das and Praveen Devarao)

Outside of a bank vault with locked door
Photo by Jason Tent on Unsplash

With stricter data policies coming into effect globally, it is critical that organizations have a robust and acceptable process around their data handling. How handy would it be if one could define data access rules! It would be handier still if one could apply these rules automatically when accessing data. In this post we will explore one such construct of Watson Knowledge Catalog called Data Protection Rules ( DPR ).

Watson Knowledge Catalog [WKC] is a data catalog integrated with data governance capabilities. It provides tools and constructs for a self-service data governance model. WKC helps users to discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization. WKC enforces governance constructs based on the user accessing the system with providing all the users the same look and feel of the platform via common access points. One of the key constructs to achieve this kind of self-service governance capability is Data Protection rules.

Data protection rules are standing instructions which users can author on the platform. Based on these rules, the data access and form of access (deny or mask the data) within the platform is controlled depending on persona accessing it.

The data protection rules are automatically enforced when a user attempts to view or act on a data asset in a governed catalog. These rules prevent unauthorized users from accessing data or transforming the data. In this way only such allowed data is made available for the user.

Let’s see it in action.

Creating a catalog

To understand and see Data Protection Rule capability, let’s create a new catalog in your IBM Cloud Pak for Data instance and add a data asset to it.

The image below shows the dashboard of CloudPak for Data with access points to different features. Here select All catalogs and click on the Create Catalog button.

screenshot of dashboard of CloudPak for Data with access points to different features
Cloud Pak for Data Dashboard — Accessing catalog

This will take you to a screen, as in the image below, to key-in the name and description. Note the checkbox on the screen. Select the checkbox to enforce the data protection rules — this allows for auto enforcement of rules to all assets within the catalog.

screenshot of adding a new catalog
Enforce Data Protection Rule while catalog creation

Add assets into the catalog: Access the Add to catalog menu and select Local files to upload a CSV file. One can add connection information to data sources, like RDBMS, Cloudant, and HDFS, from which you can add a connected asset. The term ‘connected assets’ here refers to assets whose data can be fetched remotely from the specified connection.

Let’s use a CSV file containing information about individuals where one of the columns houses credit card numbers.

As can be seen in the image below, the screen contains a preview of the uploaded asset showing the columns in the CSV file and sample data from the CSV file.

screenshot of a CSV file
Asset preview screen

On the addition of the asset, a background process is automatically triggered to analyse and understand the asset. The resultant of this process is to identify what type of data [classification/category] is possibly present in the columns of the asset, the distribution of data within those columns, and a few other statistics. This background process is another key capability of WKC, which plays a role in achieving self-service governance.

The screenshot below shows the profile information of the uploaded asset. The profile information contains the auto-classification information of the column data along with other statistical information such as the distribution of data. For the purposes of this post, let’s focus on the column classified as American Express Card.

screenshot of data profile of data in a CSV file
Asset profiling information view

Add collaborator to the catalog: Access the collaborators page of catalog and add user Praveen Devarao as a collaborator.

screenshot of adding a user to the catalog
Adding collaborator to a catalog

With the catalog now ready, let’s move on to author our first Data Protection Rule.

Creating a Data Protection Rule

To access the data protection rule feature, login to your IBM Cloud Pak for Data instance. From the left hand navigation bar, access Rules under the Governance section.

screenshot of admin page for Cloud Pak for Data
Cloud Pak for Data Dashboard — Accessing Rules

Once you land on the Rules page, you will get to see a list of all Published and Draft rules defined in the system. To start with, the list will be empty and one can create a new rule from the button `Add Rule` -> `New Rule`.

After clicking on the Data Protection Rule tile, you will be sent to the rule creation page.

The image below shows the new rule creation page, where the user can enter the rule name and definition and then select from options on which the conditions can be based.

screenshot of new data protection rule creation page
New rule creation page

A data protection rule consists of criteria that specify which data to control and an action that specifies how to prevent access to that data.

Criteria

The criteria consists of one or more conditions combined by operators. A condition consists of one or more statements, that describe the contents of data and are combined by operators.

The statements can contain the following terms in the left hand side:

  • Asset owner: The user who owns the asset.
  • Business term: The business term that has been assigned to an asset or column, for example, Customer
  • Data class: The classification of a column that describes the contents of the data, for example, e-mail address
  • Tag: The tag on an asset or column, for example, marketing or client information
  • User name: The user requesting access to an asset.
  • Classification: The type of sensitive information in the asset, for example, confidential information or PII
picture of the word “secure” written on a chalkboard with the picture of a padlock drawn next to the word with an actual combination lock lying below where the word is written
Photo by Nicole De Khors from burst.shopify

Action

The action part can have 2 options :

  • Deny access to the entire asset : Selecting this option blocks the accessor from seeing the content of the asset.
  • Mask the data : Selecting this option presents the asset to the accessor in a form that accessor is intended to use, by masking the data in columns which the accessor is unauthorized to access.

Drilling down on masking yet further, masking can be one of three types:

  • Redact: All the characters in the data are replaced by X. For example 452–821–1120 is replaced by XXXXXXXXXX
  • Substitute: The data is replaced by values that don’t match the original format. For example 452–821–1120 is replaced by 0b23xa2394013
  • Obfuscate: Data is replaced by similarly formatted values. For example 452–821–1120 is replaced by 008-219-6240

Let’s create a rule which will deny access to an asset which has a column containing credit card information and the accessor is user Praveen Devarao.

Note: You can create a different user on Cloud Pak for Data platform and use the same for trying out the Data Protection Rules in WKC.

screenshot of creating a new data protection rule
Rule to deny access

The above image shows a screenshot of how the screen with creating rule conditions and actions to deny access to certain users shows up in WKC.

Masking of Data: If an action selected in the rule is for masking data then we will need to select the columns which are to be masked. The same will be honoured at the access time.

screenshot showing an admin selecting which masking type to use for data protection
Rule to mask data

In the image above we show how a rule to mask the values of a specific column, with the conditions to check for the user trying to access the asset and to check for a particular column of the asset, is created.

After clicking on Create rule, the Data Protection Rule gets enforced immediately.

Seeing Data Protection Rule in action

Login as Praveen Devarao from a different browser and try accessing the asset we created in the catalog.

If the action selected while creation of the rule was to deny access, you should see that access is blocked as shown in the image below. The page also displays which Data Protection Rule is being enforced that denies access to unauthorized users.

screenshot of access denied for data as per a data protection rule
Access denied

If the action selected while creation of the rule was to mask data, we can see that the asset preview page shows the redacted columns as containing xxxxxx instead of the actual values as in the image below.

screenshot of redacted data in an uploaded CSV file
Data redacted

Conclusion

In this post we learnt what Data Protection Rules are and how they work in the Watson Knowledge Catalog. In exploring WKC’s capability in defining a Data Protection Rule, we demonstrated the action and criteria portions of the Data Protection Rule. We also showed the DPR in action to block access to data assets by unauthorized users. We hope this demonstration encourages you to try it out for yourself — if your organization isn’t already using it, we encourage you to go try the Watson Knowledge Catalog Data Protection Rules for yourself.

--

--