IBM Open Data for Industries and Cloud Pak for Data Services
General integration approach 1.0 (Part 2)

Quick Recap
Part 1 of this post has covered following steps of Open Data for Industries and Cloud Pak for Data service integration:
STEP 1: Cloud Pak for Data Analytics Project and Notebook set up
STEP 2: Data refinery and cleansing
STEP 3: Data Ingestion to IBM Open Data for Industries instance
This is Part 2 which will cover:
STEP 4: Data searching and retrieving data from Open Data for Industries instance
STEP 5: Data analysis and prediction using Cloud Pak for Data services

Search and retrieve data from Open Data for Industries instance
IBM Open Data for Industries supports REST APIs to query and retrieve raw data and metadata. Within a Python notebook, the following steps needed to retrieve data from an Open Data for Industries instance.
Getting bearer token from keycloak
Please refer to Part 1 for details of getting a token from Keycloak.
Set up API calls for data searching and retrieving
When retrieving raw data or metadata from an Open Data for Industries instance, the often-used services are Open Data for Industries search and delivery services.
The Python code snippet can be like below:
headers = {
'data-partition-id': OSDU_DATA_PARTITION,
'authorization': "Bearer " + BEARER_TOKEN,
"Accept": "application/json",
"Content-type": "application/json",
}url = ODI_INSTANCE + SERVICE_PATH + <service_endpoint>
r = requests.request(<method>, url, json=<body>, headers=headers)
return r.json()
- Query to get information.
The following cURL command showed a search example with SERVICE_PATH, <endpoint>, and <body> replacements:
curl --request POST \
--url <cpd-route>/osdu-search/api/search/v2/query \
--header 'authorization: Bearer <<access_token>>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '{
"kind": "opendes:*:*:*",
"returnedFields": [
"id", "kind","data"
],
"query": "id:\"opendes:doc:c5cdb9bb4bb84baa81ccf067c58c2750\""
}'
2. Based on search results, the end user can use the Delivery service “GetFileSignedUrl” endpoint to return the signed URL for a specific record, and get its raw data by using the requests library
curl --request POST \
--url <cpd-route>/osdu-delivery/api/delivery/v2/GetFileSignedUrl \
--header 'authorization: Bearer <<access_token>>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes' \
--data '{"srns": ["srn:file/segy:mysegy1:"] }'
The following is an example of a signed URL:
# Open Data for Industries using minIO to save raw data.
# https://minio-osdu-minio.odi-ibmslb-demo-b7cd7bacf7d92146ece9843b7b89c840-0000.us-south.containers.appdomain.cloud/osdu-seismic-test-data/140435.segy?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210210T022030Z&X-Amz-SignedHeaders=host&X-Amz-Expires=86399&X-Amz-Credential=minio%2F20210210%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=d747a52bf3e401684c1a1dfd15151677bde848b5b07ea2cf00c4fdd67d5792e7
3. Once we get a signed URL, we can use the following code snippet to get the raw data and save it as a data asset in original format or CSV format to the project:
import os, lasio
# record name
datapath = os.path.basename(urlparse(record_signed_url).path)
# record content
record = requests.get(record_signed_url).text
if record.ok:
datacontent = record.text
#print(las_content)
datapath = "/project_data/data_asset/" + datapath
# save raw data as an data asset of the project.
with open(datapath, 'w') as outfile:
outfile.write(datacontent)
las_csv = datapath[:-3] + 'csv'
print(las_csv)
lasio.read(datapath).df().to_csv(las_csv, encoding='utf-8')
Data analysis and business decision

When metadata and raw data from Open Data for Industries are available within a notebook or saved as data assets of a Cloud Pak for Data analytics project, end users can use Cloud Pak for Data services to do data analytics to help a customer make business decisions.
Data Analyzing
Cloud Pak for Data provides Watson Studio (WSL), Watson Machine Learning (WML), and other supplemental services for analyzing data and building models.
Data scientists can analyze data, train ML models, and define decision optimization (DO) models. To train models, they can use SPSS Modeler, or Auto AI. The graph below shows an in-progress AutoAI procedure.

The trained models can be promoted to the Deployments Space and deployed (as shown below) to make them available for AI infusion into business processes and/or integrate with applications or services, such as Palantir.


The deployed model can also be called via the model deployment REST API within notebooks.
A specific example of the above is of using an OpenVino ML model to predict seismic data within a notebook. This example is not directly using Cloud Pak for Data and its services, but similar codes can be used within Cloud Pak for Data analytics project notebooks also.
Business Analytics
Cloud Pak for Data provides some tools and services to analyze and visualize patterns and trends in existing data and helps customers to make business decisions.
Cognos Dashboard is one of the base services that Cloud Pak for Data supports.
Within an analytics project, the end user can click “Add to project” -> “Dashboard” to create a new dashboard, and select existing data from asset to visualize, as one of screenshot show below:

Besides Cognos Dashboard, Cloud Pak for Data also supports Cognos Analytics and Planning Analytics services for business analytics.
Other services integration
Cloud Pak for Data provides a platform to integrate many IBM and partner services to support data analytics following AI ladder. This post just briefly lists some major steps of using Open Data for Industries and several Cloud Pak for Data services to analyze Oil and Gas domain specific data.
With the Cloud Pak for Data platform, end users are able to use many other services to do more powerful analyses. The following services are also used often for some use cases:
- Watson OpenScale to understand how your AI models make decisions, to detect and mitigate bias and drift, and also to increase the quality and accuracy of your predictions;
- Watson Discovery to extract answers from complex business documents.
Another integration approach and data flow between Open Data for Industries and Cloud Pak for Data services
So far, this post has talked about one flow of data processing with Open Data for Industries and other Cloud Pak for Data services, as the listed steps below (and also mentioned at the beginning of the post):
- Source data refinery and cleansing
- Data Ingestion to an IBM Open Data for Industries instance
- Data searching and retrieving data from an Open Data for Industries instance
- Data analysis and prediction using Cloud Pak for Data services
There is also another possibly data flow, like graph below shows:

- Using existing data ingested into Open Data for Industries instance already.
- Using a notebook to search and retrieve data from an Open Data for Industries instance and save as a data asset with Analytics Project
- Do data refinery and cleansing with Data Refinery
- Data analysis and prediction.
For some customer cases, since data is already loaded into the Open Data for Industries instance, the above scenario will fit. Even though the data flow is different from what I talked about before, the basic techniques and details of these steps, from searching and retrieving data from an Open Data for Industries instance to data refinery and cleansing, and then to data analysis and prediction, are really same as what I mentioned before. So this post can also help with the above scenario.
Conclusion
This post highlighted steps of integrating IBM Open Data for Industries with Cloud Pak for Data services through Python notebooks. It covers technique details that can work with different scenarios, including the two mentioned with this post.
Through the use of a notebook, once the retrieved data from Open Data for Industries are saved as data assets of a Cloud Pak for Data analytics project, these data assets can be consumed by many services, following general practices of Cloud Pak for Data.
IBM Open Data for Industries has a roadmap to further integrate with Cloud Pak for Data services as a data source directly. Please stay tuned for my “Integrate IBM Open Data for Industries with Cloud Pak for Data Services — general integration approach 2.0” post later 2021.
Some useful links:
- Accessing Data from Cloud Pak for Data instance.
- Install custom libraries through notebook (WSL).
- Analyzing data and building ML models.
- Deploying and managing ML models.
- AI solutions and Watson services.
- Data and AI applications with Palantir for IBM Cloud Pak for Data.
- IBM Data and AI Accelerators powered by Cloud Pak for Data.