Improve enterprise governance by integrating logical data modeling with IBM Knowledge Catalog

Karl Hegarty
7 min readDec 14, 2023

--

Authors: Karl Hegarty, Sébastien Marchand, Julie Forgo

Build on the business terminology you need for your enterprise governance framework by using IBM Knowledge Catalog with a data modeling tool to automate and manage lineage assignments.

Organisations who develop enterprise logical data models with a tool like ERwin Data Modeler, including IBM Industry Models users, can leverage integration with IBM Knowledge Catalog to identify process improvements, reduce costs, increase efficiency, and design well-informed applications.

This post describes how to:

- Import logical data models into IBM knowledge Catalog (IKC) using MANTA Automated Data Lineage to create the assets for effective metadata management of logical data models.

- Import selected Knowledge Accelerators — which provide extensive business vocabulary content covering key industries: Banking, Insurance, Healthcare and Energy & Utilities.

- Use a new Knowledge Catalog feature available in Cloud Pak for Data 4.8 that creates thousands of assignments between the Knowledge Accelerators vocabularies and logical data models. The number of assignments depends on the size of the data models and Knowledge Accelerators vocabularies you import.

100’s of IBM clients have leveraged IBM Industry Models for Banking, Insurance, Healthcare and E&U. This blog describes how clients who have used such models can now import them into IKC and automatically link them to the equivalent Knowledge Accelerator Business Terms for each industry.

Components for logical data modeling

An integrated data governance framework can include these components:

Data models are enterprise-wide business and design data models used to build and accelerate the development of reporting solutions, such as business intelligence (BI) and standardized reporting. The data models are delivered and customized as logical data models by using entity-relationship (ER) representation.

A Business Data Model (BDM) is a conceptual data model that specifies the third-normal-form data structures that are required to represent the concepts that are defined in the business terms.

An Atomic Warehouse Model (AWM) is a design-level data model that represents the enterprise-wide repository of atomic data used for information processing.

A Dimensional Warehouse Model (DWM) is the enterprise-wide repository for analytical data. It contains star-schema-style dimensional data structures that are organized around fact entities.

Step 1: Import logical data model into IBM Knowledge Catalog

This section describes how to export a logical data model from a data modeling tool (we use Erwin as an example) and import it into IBM Knowledge Catalog to prepare for automating the assignment of logical data assets to business terms imported from an IBM Knowledge Accelerators. After you import the file, you can use MANTA Automated Data Lineage to gain visibility into your data environments with a comprehensive map of data flows, sources, transformations, and dependencies to prepare for assignment mapping.

Prerequisites:

· Install the MANTA Automated Data Lineage feature. https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=lineage-installing

· Enable lineage import. https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=administering-enabling-lineage-import

Import the Logical Data Model:

1. Prepare to export your data model, following guidance for your data modeling tool. For the Erwin example, remove all diagrams (hint). Export the data model from the data modeling tool. For each data modeling tool, the exported file must have a specific format so that the data model can be properly added to a catalog. For this example, Save the Erwin Data Model as Standard XML. Compress or zip the Standard XML file. In IBM Knowledge Catalog, create a catalog and give it a name, such as ‘sample_catalog’

2. Create a new empty project named ‘sample’ and add / upload the compressed Erwin xml file as a new asset for the project.

3. Create a new Metadata import asset, using the import goal ‘Import data model’ which will connect to the Erwin zip asset to import the logical data model.

4. Name the import ‘sample model’ — select the target as ‘Catalog’ and select the new ‘sample_catalog’ created earlier.

5. For ‘Select scope‘ browse for the Erwin zip asset uploaded and chose ‘ERwin Data Modeler’ from the Data modelling tool list.

Choose Create to start the import. Depending on the size of the model the import job will take a while. The status is highlighted in the information bar which will state “Metadata import in progress. sample model is currently importing from {Erwin zip file.xml.zip}”. When the import is complete the information bar will state “Metadata import complete. {counter} assets were processed successfully”.

6. View imported assets in the new sample_catalog. Use filters to view just the new catalog asset types of logical model, logical model attribute, logical model entity or logical model relationship.

View of an imported data model.

The model is now ready to create assignments between the logical data model assets and the business terms.

Step 2: Import the Knowledge Accelerators vocabularies.

If you haven’t already imported a Knowledge Accelerators vocabulary, choose the one for your industry to access the complete Business Vocabulary, Business Performance Indicators, Industry Alignment Vocabularies as well as Business Scopes. For details on installing an accelerator, consult this comprehensive guide, including a video of how to import https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=accelerators-getting-started

The following tables describe the logical data model (LDM) types by industry and by the Knowledge Accelerators key top-level root categories that are relevant for imported vocabularies. The numbers in these tables are for ‘out of the box’ IBM Industry Models data models and Knowledge Accelerators vocabularies, to help you plan for approximate number of terms and number of entities-attributes asset assignments that would be created.

Table 1: total volume of imported terms linkages

Table 1 shows term assignments counts for each industry, identifying the expected volume of terms for Knowledge Accelerators area.

Table 2: Volume of entity / attribute linkages by model type

Table 2 shows expected entity and attribute assignment volumes by data model type — where ‘_bdm’ denotes business data models, ‘_awm’ — atomic and ‘_dwm’ — dimensional.

Note: where a model has been customised — specify the starting or origin model type.

Step 3: Adding logical data model assignments.

The final step of the integration is to perform the asset assignments — a job that will create the linkages between the catalog asset s entities or attributes and the business terms.

Within the same group of endpoints used for importing Knowledge Accelerators vocabularies there are two additional endpoints for initiating the asset assignment job and monitoring status.

To assign the terms to the logical data models:

1. Authenticate / authorize the open API page with a bearer token. See the topic Authenticating and Authorizing in the Getting Started with Knowledge Accelerators documentation.

2. Find the “POST /v1/knowledge_accelerators/asset_assign_terms” endpoint and enter the required details in the request body.

Example of request body

3. The response includes a process_id which can be used to check the status of the term assignment job. For example, i.e.
“GET /v1/knowledge_accelerators/asset_assign_terms/status/{process_id}” shows the status of the assignment job in the log file.

Logical data model assignments

By the end of job, all possible assignments will have been added and can be seen like the view below for given asset types where the Business terms column is now populated with 1 or more business terms the entity or attribute is assigned to.

Business terms column populated with assignments

When viewing, the entity or attribute has the Business Term linked

View from the entity or attribute

Conversely each assigned business term has a list of related content linkages back to the entity or attributes assigned.

View from the business term

To see a more rolled up view, you can add a dashboard to visualise the assignments using the reporting database feature. Below is a sample view of one such dashboard. This is available to download here, see ‘ldm-asset-dashboard’.

Sample dashboard view

It is also possible to visualise the linked LDM Entities and Attributes, Business Terms and data assets using Relationship Explorer — as the example below shows.

Sample Relationship Explorer view

Summary

Follow the guidelines and steps in this post to create assignments and link imported Knowledge Accelerators vocabulary business terms with imported data (modeling) models covering finance, healthcare, insurance or energy and utilities to enhance the governance value and needs of your organization.

For more information, check out IBM Knowledge Accelerators and IBM Knowledge Catalog

--

--