Building a data privacy dashboard using IBM Knowledge Catalog

Paul Kilroy
7 min readJul 14, 2023

--

Authors: Paul Kilroy, Pat O'Sullivan, Julie Forgo

This blog describes how to build a Data Privacy dashboard that assesses the privacy classification of a data source and identifies areas where additional Data Protection rules might be warranted.

The process involves use of the following IBM Knowledge Catalog and Cloud Pak for Data features:

· Knowledge Accelerators, specifically the Data Privacy accelerator provided with the IBM Knowledge Catalog 4.7 release

· Metadata Import to import data source metadata and Metadata Enrichment to associate it to the business terms provided with the Knowledge Accelerator

· Data Protection rules to mask data that is personal / sensitive

· The IBM Knowledge Catalog Reporting Database as the source of metadata information for the dashboard

· IBM Dashboards to create a dashboard to visualize the information. Tip: These steps demonstrate using IBM Dashboards, but you can use another business intelligence tool, such as Cognos Analytics or Tableau, instead.

IBM Dashboard showing data privacy classifications for tables/columns in a catalog.

Importing the Knowledge Accelerator scope for Data Privacy

The Knowledge Accelerators that are provided with IBM Knowledge Catalog now include a Data Privacy scope. The accelerator includes a pre-defined set of 500–600 business terms (depending on industry) that cover all of the main business items relevant to Data Privacy. These business terms are categorized in a specific data privacy taxonomy, so that you can see at a glance which business terms pertain to various areas of Data Privacy such as Financial Information, Health & Biometric, Government IDs, and so on.

The business terms are classified as either:

· Personal Information (PI): any data relating to an identified or identifiable individual. PI includes both identifiers, such as someone’s name or employee serial number, as well as any personal information that can be reasonably associated with an individual, such as age, profession, preferences, net worth or mobile device location.

· Sensitive Personal Information (SPI): personal data consisting of information relating to an individual with regard to racial or ethnic origin; political opinions; religious beliefs or other beliefs of a similar nature; trade union membership; physical or mental health or condition; sexual life; or any criminal or alleged criminal history of a person.

For details about importing a Knowledge Accelerator, see Getting started with the Knowledge Accelerators.

An example of some Health & Biometric business terms from the Knowledge Accelerator.

Governing your data assets using Metadata Import and Metadata Enrichment

After you import the Knowledge Accelerator business terms for Data Privacy, the next step is to associate your data assets to the data classes and business terms, following these steps in IBM Knowledge Catalog:

1. Create a connection to a data source.

2. Use the Metadata Import capability to import the metadata on data assets from the data source.

3. Use the Metadata Enrichment capability to enrich the data assets with data classifications and associations to the business terms that were imported from the Knowledge Accelerator.

4. Publish the enriched metadata to a catalog.

These steps are fully described in:

· Connecting to data sources

· Importing metadata

· Managing metadata enrichment

Creating Data Protection rules to mask personal/sensitive data

Optionally, create some data protection rules to protect data in columns that are associated with PI or SPI business terms

1. Navigate to Governance > Rules > Add rule > New data protection rule.

2. Specify a name and definition for the new rule.

3. For a condition, choose ‘Business term’ ‘contains any’ ‘Social Security Number’ (or some other business term that makes sense to protect).

4. Select ‘redact columns’ This will substitute the value with Xs to obscure the sensitive data.

5. Click Create. Your new rule should look like this:

An example Data Protection rule

Setting up reporting for IBM Knowledge Catalog

The metadata from IBM Knowledge Catalog can be sent to a reporting database and can be easily reported on using BI Reporting tools, such as Cognos Analytics, IBM Dashboards, or Tableau. You can also share the metadata using standard SQL queries.

To set up the reporting database:

  1. Create a database to store the data. There are a number of database engines supported. For this example, we created an IBM DB2 with a pagesize of 32K.
db2 create database <DATABASE_NAME> PAGESIZE 32 K

2. Create a schema in this database. For example:

db2 connect to <DATABASE_NAME>
db2 create schema WKCREPORT

3. In Cloud Pak for Data, create a platform connection to the database:

  • Navigate to Data > Platform connections > New Connection > IBM DB2
  • Enter the connection information, such as host name, port, username, and password
  • Test the connection to make sure it’s working, then click Create

4. In Cloud Pak for Data, set “allow reporting” on the catalogs and categories you wish to report on:

  • Navigate to Catalogs > Your Catalog > Settings tab
  • In Reporting on asset metadata section, select Grant Access to allow reporting on this catalog
  • Navigate to Governance > Categories > Your Category > Access Control tab
  • In Reporting on governance artifact metadata section, select Allow Reporting to allow reporting on this category.
  • Repeat for any other categories you wish to report on. Make sure to include the categories containing Knowledge Accelerator business terms, and the ‘[uncategorized]’ category which contains out of the box classifications and data classes.

5. In Cloud Pak for Data, configure the reporting:

  • Navigate to Administration > Configurations and settings > Reports setup
  • Select the database and schema you created in the previous steps
  • Choose which information to report on. Make sure to include:
    - Catalogs: The catalog containing the data assets
    - Categories: the categories containing Knowledge Accelerator business terms, and the ‘[uncategorized]’ category which contains out of the box classifications and data classes
    - Others: any Data protection rules that have been created

6. Start reporting to kick off the reporting process. This might take some time, depending on the amount of data that is in scope.

7. Once reporting has been established, connect to the database, and create this data privacy view. The view joins data across several tables, following this sequence:

  • The view joins the container catalog table to the data asset table, pulling the columns into the catalog table.
  • A second join to the governance artifacts table pulls in the associated business terms and data classes established in the Metadata Enrichment process. The view also pulls the classifications, such as PI and SPI classifications of the governance artifacts that are defined in the Knowledge Accelerator.
  • Finally, the view performs an outer join to the tables storing Data Protection Rules to check if protection rules are in place to protect the data. This is especially important in the case where data has been classified as personal data.

For more details, see setting up reporting for IBM Knowledge Catalog.

Creating a Data Privacy dashboard

It’s now time for the fun part, creating a dashboard to visualize your key data.

  1. Create a new project for the dashboard.
  • Navigate to Projects > New Project > Create an empty project.

2. Add the platform connection created earlier to the project.

  • On the Assets tab, click New Asset > Connection > From platform tab.
  • Select the connection and click Create.

3. Import the view created earlier as a data asset using Metadata Import.

  • On Assets tab New Asset > Metadata Import > Discover.
  • Select target: this project.
  • Scope: select the Database Connection > Schema > V_PRIVACY_REPORT view created earlier.
  • Continue through the process, accepting the defaults.

4. Create the dashboard

  • On the Assets tab, click New Asset > Dashboard editor
  • Select the V_PRIVACY_REPORT view as the data source
  • Once data preparation is complete, drag and drop fields from the data source to create visualizations on the canvas. Tip: You can also import the pre-defined report. See the next section for steps.

If you want to import the pre-defined report into the dashboard you created, follow these steps.

1. Copy the contents of this file to the clipboard

2. In the dashboard, press the following keys together to bring up the board specification json: ctrl + q + /

3. Select all and paste to overwrite with the pre-defined dashboard.

4. Click Update.

5. The dashboard will need to be re-linked to the view in the project.

  • Click the sources icon on the left
  • Click the ellipse next to V_PRIVACY_REPORT and relink.
  • Choose the data asset and click Select.
  • The report will refresh, and data will be displayed.
  • Select the save icon to save the dashboard.

The pre-defined dashboard contains three tabs:

  • Privacy classifications: provides an overall view of the privacy classifications for tables and columns in the catalog.
  • Data Protection heat map: shows which columns are classified as PI or SPI, and which of these columns have a Data Protection rule in place that protects the visibility of that column’s data.
  • Data protection rule coverage: provides another view of the columns that are covered by Data Protection rules, and where there might be gaps in coverage. For example:
    - Select TABLE_NAME in the filters area, and choose one or two tables.
    - The example screenshot shown below shows 7 privacy columns, of which 5 are covered by data protection rules.
    - Two columns — CUST_FNAME and CUST_LNAME (light color in the chart) are not covered by data protection rules, so it would make sense to add rules to cover these items.
An example of data protection rule coverage analysis. Light color indicates potential gaps in data protection, where a column is classifies as PI/SPI but no Data Protection rule is in place.

Summary

The Knowledge Accelerators provided with IBM Knowledge Catalog provide a set of governance artifacts that help organizations kickstart their data governance activities in the area of data privacy. Once data assets are associated with the business terminology, it’s easy to build customizable dashboards that visualize the data privacy status of a catalog, and highlight the areas that are covered by Data Protection rules.

Related Information

--

--

Paul Kilroy

Senior Development Manager, Knowledge Accelerators and Industry Models, IBM Data & AI