IBM cpdctl CLI on IBM Cloud

Jacek Midura
6 min readJun 2, 2023

Go with the flow: Automate Cloud Pak for Data as a Service processes with cpdctl

More than a year ago, IBM rolled out a CLI tool named IBM Cloud Pak for Data Command Line Interface (cpdctl) for automating your machine learning tasks on IBM Cloud Pak for Data (CP4D). You may know it from one of our previous blogs describing use of this tool in Jenkins pipelines automating machine learning assets delivery from creation to production. At the time, the utility required that your assets be in an on-premises cloud.

In this article, I will showcase how you can now manage and automate your machine learning assets hosted on IBM Cloud Pak for Data as a Service (CPDaaS) using cpdctl. That’s right — you can now apply an automation solution to assets based in a managed, subscription-based environment on IBM Cloud in the same way you can manage them locally on a CP4D platform running on a Red Hat® OpenShift® cluster.

Sample scenario

Let’s step through a simple scenario for configuring cpdctl to work with CPDaaS. To try it yourself, start by exporting a project to a ZIP file containing a notebook and a job definition. This article demonstrates how to:

  1. Configure cpdctl to connect to CPDaaS, with emphasis on how to use ibmcloud configuration management tool to automate configuration and connect to a specific region.
  2. Import the zip file into a project on CPDaaS.
  3. Run the notebook job and review the results.

Configuring cpdctl for CPDaaS

cpdctl works by sending and receiving data using CP4D API calls. To connect to the APIs, you can manually configure cpdctl, or you can use the preferred automatic configuration, which uses IBM Cloud configuration management utility to securely configure the connection.

Automatic configuration

To automatically configure cpdctl to connect to the CP4D API commands, simply log in to your IBM Cloud account to get the benefit of automatic authentication for cpdctl sessions. This mode of operation is documented in the Zero Configuration section of cpdctl’s README.

Automatic configuration — cpdctl uses IBM Cloud CLI session metadata

IBM Cloud CLI offers a variety of authentication methods, for example interactive login, login based on environment variable IBMCLOUD_API_KEY, or Single Sign-On login. They all have the effect of creating a session with IBM Cloud that cpdctl can use. This is not only convenient but also more secure than traditional storing of access credentials in a cpdctl configuration file.

The IBM Cloud session is protected by an access token with a one-hour lifetime. When the token expires, cpdctl prompts for the ibmcloud login command. For example:

Expired IBM Cloud access token

Note: cpdctl will automatically use IBM Cloud CLI session metadata only if it hasn’t been previously configured manually using cpdctl config ... commands. In other words, explicit configuration has precedence over automatic detection of available connection information.

If cpdctl has been previously configured, for example to connect to an on-premise CP4D instance, it is still possible to use IBM Cloud CLI session metadata — the solution is to create additional configuration profile bound to IBM Cloud CLI configuration directory.

Manual configuration

To manually configure cpdctl to connect to the CP4D APIs, use cpdctl config commands to provide connection details (URL and authentication credentials) the same way you configure cpdctl for on-premises CP4D. For example, a manual configuration script looks like this:

Manual profile configuration for CPDaaS

Running the sample scenario on CPDaaS

The sample scenario replicates a typical process a Data Scientist might follow when working with IBM Cloud Pak as a Service:

  • Import assets into a new analytics project
  • Run the imported job
  • Review logs from the job run

To prepare to run the scenario, we:

1. Use ibmcloud login to establish an IBM Cloud CLI session.

2. Provide details for the Cloud Object Storage (COS) service used for storing assets.

After the initial setup, we can create the project and get on with our data science tasks. Let me show you how to determine the ID and CRN of COS service instance which are then passed to project creation command.

We can use the command ibmcloud resource service-instances to list the names of all active Cloud Object Storage service instances. We then pass the name of the COS instance we want to use to ibmcloud resource service-instance command.

Determining storage service instance identifiers

We put the values of the COS ID and GUID fields in a storage.json file that has the following structure:

Example storage.json file

Now we’re ready to create the analytics project in CPDaaS. We reference the storage.json file we had just created using the --storage flag:

Creating analytics project

This is the first place we run cpdctl in our scenario and the fun part is that we can jump directly to productive work instead of wasting time on configuration tasks.

The next step is to import assets into the newly created project. Let’s do that:

Importing assets

Output from the command cpdctl asset search confirms that the notebook and job definition were indeed imported into the project. The notebook used here is very simple:

print(“Running in IBM Cloud!”) import os print(f”Parameter value: {os.environ[‘NOTEBOOK_PARAMETER’]}”)
Example notebook

The job definition just runs the notebook. Now we are ready to start the job run:

Running the notebook job

The --async parameter tells cpdctl to start the job run and exit immediately, without waiting for the run completion. This is very handy when the run is expected to last a long time That’s not the case, however, in this demo scenario, so we can proceed to making sure that the job run completed, using the command job run wait. Finally, we download the job run logs, using the command job run logs. These commands need the identifier printed above (4175…) as a value for --run-id parameter:

Printing job run logs

Changing IBM Cloud region

IBM Cloud has a global presence. Geographical locations of the datacenters are represented by the term region. You can set the region targeted by IBM Cloud CLI commands during login:

Log in to a specific IBM Cloud region

You can also check or change the current region using the command ibmcloud target:

Switching region for IBM Cloud CLI session

If you’re using the zero configuration mode (without manual configuration), the change of current region is automatically reflected since cpdctl always uses up-to-date IBM Cloud CLI metadata:

Listing projects in eu-de region

On the other hand, if you have previously manually created configuration profile to connect to Cloud Pak for Data as a Service, you can update it with new region information:

Switching configuration profile to a different region

Note that different regions have separate sets of resources. The listing above does not include the project import-project we had created previously because it happened in a different region.

We could repeat the steps shown earlier (create project, import assets, run notebook job, and review logs) in eu-de region that became the current one for IBM Cloud CLI and, as a consequence, to cpdctl.

$ cpdctl project create --name import-project-de --storage @storage.json
...

location /v2/projects/fc9f385d-2f35-4e84-a7fc-46f221fd948d

$ cpdctl asset import start --project-id fc9f385d-2f35-4e84-a7fc-46f221fd948d --import-file exported_project.zip
...

ID: 560d9718-3155-402a-ba78-e9aa500ce633
Created: 2023-05-11T13:52:27.261Z
State: completed

$ cpdctl job run create --project-id fc9f385d-2f35-4e84-a7fc-46f221fd948d --job-id 4a891e73-df71-4f96-b0e6-c7bc92f58266 --job-run '{"configuration": {"env_variables": ["NOTEBOOK_PARAMETER=value for eu-de region"]}}'
...

Cell 1:
Running in IBM Cloud!

Cell 2:
Parameter value: value for eu-de region

ID: 67d39069-587d-4ba3-8dc5-1f96a3267429
Name: Notebook Job
Created: 2023-05-11T13:53:55Z
State: Completed
Tags: []

This time the flag --async was not passed to command cpdctl job run create. As a result, we can see the job run log directly in the command output.

Summary: manage your Cloud assets with confidence

This article demonstrated how to use cpdctl to manage machine learning assets on Cloud Pak for Data as a Service. Configure cpdctl manually, or take advantage of the ibmcloud automatic configuration management capabilities. Use cpdctl to automate common data science activities, whether your assets are in the cloud, or on premises.

Written by: Jacek Midura, Rafał Bigaj.

--

--