Reference Data Management in Watson Knowledge Catalog — Chapter 3

Published in

IBM Data Science in Practice

7 min readMar 8, 2021

Chapter 3: Custom Columns and API access to WKC Reference Data Management system

digital dashboard with different multicolored indicators and sliders — Photo by Nejc Soklič on Unsplash

In Chapter 1 we learnt how having a reference data repository which every one can access is a key tool to achieve standardization across an organization.

In Chapter 2 we learnt about how Reference Data Sets and values within it can be organized hierarchically. We learnt about value mappings where we can relate value in one Reference Data Set to other.

Using Watson Knowledge Catalog’s Reference Data management capability we explored how to store Code, Value, and Description of a reference data value. In this chapter, we will explore the capability of storing additional information related to a reference data value using the custom columns capability.

We will also conclude the series in this chapter by finally touching upon the API access of the Reference Data Capability in Watson Knowledge Catalog for integration with other sub-systems programmatically.

Custom Columns

In Watson Knowledge Catalog one can define more columns to a Reference Data Set apart from code, value, and description. These extra user-defined columns are termed as custom columns.

While code, value, and description are sufficient to represent a reference data value, there are scenarios where one would have to store additional information associated with the value for easy access. For example, consider the Indian States Reference Data Set which contains information of the state ISO code as code, the name of the state as value and a description about the state in description. It would be useful if one could fetch information about the Chief Minister of the state (head_of_state) while accessing the reference data set instead of having to refer to a different source to fetch the information. Similarly one could use these custom columns to store alternative names of the state (if any) in the local language, store translation of values in different languages, etc.

Let’s explore the Reference Data Set custom columns capability in Watson Knowledge Catalog.

The Indian States Reference Data Set which we used in Chapter 1 looks as in the below image with code and value in the left panel and the description along with other information displayed in the middle portion of the screen.

screenshot of a page for administering the reference data set Indian States — Indian States Reference Data Set with value imported from a CSV

Let us upload a CSV file with the custom column head_of_state

CSV file format to import data from

In the existing Indian States Reference Data Set, select the Upload file menu option to import the updated values from the CSV file. The image below shows the Upload file menu option that can be accessed by clicking on the hamburger menu.

screenshot of uploading a file to the reference data set — Accessing ***Upload File*** menu to update the reference data set with values from CSV file

In the dialog box that opens, select the code, value, and description columns. Move on to the next windows for selecting custom columns. Associate the head_of_state CSV column to a custom column in the Reference Data Set. Given that this is a new column, key-in the name and click on the ‘+’ button to add the definition of the new column.

screenshot of how to add a custom column to a reference data set — Accessing create new custom column definition

Next, key-in the description for the new custom column, select the type of value housed in it and click Save.

screenshot of selecting type and defining the data in the custom column — Creating New Custom Column definition ***head_of_state***

Extras: The Reference Data Set Custom Columns are defined at the WKC platform level, implying that one can associate the same custom column definition to multiple Reference Data Sets as applicable.

Click the Save button to update the Reference Data Set with information about head_of_state from the file.

The updated Reference Data Set shows up as in image below with the custom column of the respective value showing up in the middle panel.

screenshot of one entry the reference data set called “Indian States” with the custom column value “head of state” added — Reference Data Set Indian States with Custom Column **HEAD_OF_STATE**

Move it through appropriate workflow states to publish and make the Reference Data Set available for consumption on the platform.

In the next section, we will learn about the Reference Data Management APIs provided by Watson Knowledge Catalog to perform the different operations we covered in the the above section and the previous two chapters, at an Overview level.

Reference Data APIs in Watson Knowledge Catalog

Watson Knowledge Catalog provides a wide range of REST APIs to interact with the Reference Data Management capability. You can get hold of the exhaustive list of APIs by hitting the below endpoint on your IBM CloudPak for Data instance.

https://<your_cp4d_instance>/v3/glossary_terms/api#/Reference%20data%20sets

The APIs follow the REST guidelines making it easier to understand the operation it performs and how to invoke it via your favourite tool like postman or cURL.

For this post, we will learn about the POST /v3/reference_data api which can be used to create a new Reference Data Set draft.

To use the API, you will need to first generate a Bearer token to authorize yourself for accessing the endpoint. Hit the below endpoint on your Cloud Pak for Data instance providing your username and password to fetch the token

curl -k -u “<username>:<secret_password>” -X GET “https://<your_cp4d_instance>/v1/preauth/validateAuth" -H “accept: application/json”

The access token field in the response will be your Bearer token. Use this as an Authorization header input to the POST api for creating your Reference Data Set.

The body of the POST /v3/reference_data request will be as below:

{
  "name": "Countries",
  "long_description": "World countries ISO code",
  "effective_start_date": "2021-03-03T02:45:19.231Z",
  "effective_end_date": "2099-03-03T23:59:59.000Z",
  "tags": [
    "my_country_tag"
  ],
  "steward_ids": [
    "user123"
  ],
  "type": "TEXT",  "parent_category": {
    "id": "990e33f5-3108-4d45-a530-0307458362d4"
  },
  "parent": [
    {
      "id": "<id_of_parent_reference_data_set>",
      "description": "Description about the relationship"
    }
  ],
  "child": [
    {
      "id": "<id_of_child_reference_data_set>",
      "description": "Description about the artifact relationship"
    }
  ],
  "custom_columns": {
    "columns": [
      "<custom_column_id>"
    ]
  },
  "rds_values": [
    {
      "code": "IND",
      "value": "India",
      "description": "Country code for India",
      "child": [
        {
          "id": "<code_of_child>",
          "description": "Description about the relationship"
        }
      ],
      "parent": {
        "id": "<code_of_parent>",
        "description": "Description about the relationship"
      },
      "multi_value_mappings": [
        {
          "id": "<id_of_mapped_reference_data_set>",
          "codes": [
            "code_from_mapped_reference_data_set"
          ]
        }
      ],
      "single_value_mappings": [
        {
          "id": "<id_of_mapped_reference_data_set>",
          "codes": "<code_from_mapped_reference_data_set>"
        }
      ],
      "custom_columns": [
        {
          "id": "<custom_column_id>",
          "value": "Ramnath Kovind"
        }
      ]
    }
  ]
}

Note: all the JSON fields indicating a relationship in the request body like parent, child, and value_mappings are optional.

Below is description of each field

name: name of the reference data set being created
long_description: description of the reference data set being created
effective_start_date: timestamp from which the reference data set will be active. By default, it will be the current time or time of publish
effective_end_date: timestamp from when the reference data set will become inactive
tags: a free flowing list of strings defined by users for easy find-ability
steward_ids: a list of user ids for users who will be assigned as stewards of the reference data set
type: type of the value field of reference data set that can be one of the following: TEXT, NUMBER, DATE
parent_category: the category id under which the reference data set is to be created
parent: a list of parent reference data set ids
child: a list of child reference data set ids
custom_columns: the list of custom column definitions that is to be associated with this reference data set.
rds_values: a list of reference data values that will be housed within this reference data set.
- code: the unique code representing the value
- value: the value represented by code
- description: the description about the value
- child: list of child reference data value codes
- parent: parent reference data value code
- single_value_mappings: a list of 1:1 relationships with other reference data set value for the selected value
- multi_value_mappings: a list of m:n relationship with other reference data set values for the selected value
- custom_columns: the list of custom column values for the custom column definitions associated with this reference data set.

The code snippet below shows a sample cURL request to create reference data set named Countries with request body as described above.

curl -X POST \
 https://<my_cpd_host>/v3/reference_data \
 -H ‘Authorization: Bearer Bearer eYJKlmn8ndsewew1odsdb...’ \
 -d ‘{
 “name”: “Countries”,
 “long_description”: “World countries ISO code”,
 “effective_start_date”: “2021–03–03T02:45:19.231Z”,
 “effective_end_date”: “2099–03–03T23:59:59.000Z”,
 “tags”: [
     “my_country_tag”
 ],
 “steward_ids”: [
     “id_of_user123”
 ],
 “type”: “TEXT”,
 “parent_category”: {
     “id”: “990e33f5–3108–4d45-a530–0307458362d4”
 },
 “parent”: [
 {
     “id”: “<id_of_parent_reference_data_set>”,
     “description”: “Description about the relationship”
 }
 ],
 “child”: [
 {
     “id”: “<id_of_child_reference_data_set>”,
     “description”: “Description about the artifact relationship”
 }
 ],
 “custom_columns”: {
     “columns”: [
       “<custom_column_id>”
     ]
 },
 “rds_values”: [
 {
   “code”: “IND”,
   “value”: “India”,
   “description”: “Country code for India”,
   “child”: [
   {
     “id”: “<code_of_child>”,
     “description”: “Description about the relationship”
   }
   ],
   “parent”: {
     “id”: “<code_of_parent>”,
     “description”: “Description about the relationship”
   },
   “multi_value_mappings”: [
   {
     “id”: “<id_of_mapped_reference_data_set>”,
     “codes”: [
       “code_from_mapped_reference_data_set”
     ]
   }
   ],
   “single_value_mappings”: [
   {
     “id”: “<id_of_mapped_reference_data_set>”,
     “codes”: “<code_from_mapped_reference_data_set>”
   }
   ],
   “custom_columns”: [
   {
     “id”: “<custom_column_id>”,
     “value”: “Ramnath Kovind”
   }
   ]
 }
 ]
}’

Conclusion

In this series, we learnt the different aspects of Reference Data Management, explored the Watson Knowledge Capability to author Reference Data sets, and how to publish it for consumption by other users of platform.

We also explored the hierarchical organization of the Reference Data Sets and reference data values along with Cross Walks. We learnt how to use the custom columns capability to store additional information associated with a value.

Finally, we explored the API way of accessing the reference data management capability in Watson Knowledge Catalog for being able to programmatically access and integrate with other computer systems.

P.S: Give the Reference Data Management capability of Watson Knowledge Catalog a try to get hands on experience and leave your feedback or questions in the comments section of the post. We will be glad to answer your queries.

Reference Data Management in Watson Knowledge Catalog — Chapter 3

Custom Columns

Reference Data APIs in Watson Knowledge Catalog

Conclusion

Written by Praveen Devarao