Connection Assets with Decision Optimization on Watson Machine Learning

AlainChabrier
4 min readJul 6, 2021

--

Using connection assets created in your deployment space, it is now much easier to connect your deployed optimization model to a wide range of data sources.

Note that this new behaviour includes (among others) connectivity to Cloud Object Storage and Databases, and might in the mid term replace the old syntax, which might be deprecated and removed as it required the inclusion of credentials in the request payload.

Inline and referenced data

Optimization models deployed in Cloud Pak for Data (CP4D) require input data to be solved. This data is provided when a new deployment job is created. It can be provided inline or by reference.

With inline data, the data is fully included into the payload. It can be tabular data or encoded binary data. Read more details about inline data.

With referenced data, the data is stored somewhere else, and the payload only provides some connection information details on how to access this data. One possible storage location is Cloud Object Storage (COS) throught the S3 APIs. Read more details on an example of referenced data.

The referenced data payload configuration for COS looks like the following:

{
"id": "solution.json",
"type": "s3",
"connection": {
"endpoint_url": "https://s3.eu-gb.cloud-object-storage.appdomain.cloud",
"access_key_id": "xxxxxx_my_access_key_id_xxxxxxx",
"secret_access_key": "xxx_my_secret_access_key_xxxxxx"
},
"location": {
"bucket": "test-lp",
"path": "solution.json"
}
}

You can see that one of the issues is that it includes some credentials and this is not a good practice. These credentials may not be provided to the code creating the jobs.

Connection assets

A new connection_asset mechanism is now available, using a connection created and stored in the deployment space. The connection can later be used along with a particular location that points to the input data to be used.

There are thus two steps:

  1. create the connection
  2. use this connection in addition to the location of some particular data to create a deployment job.

Create connection

All types of possible connections are listed in this connectivity matrix.

For our example, we want to use Cloud Object Storage.

This is where we will use our URL, some access_key_id and some secret_acces_key that were passed into the payload before with the previous S3 referenced data syntax.

The easiest way to create a new connection is to use the User Interface in Cloud Pak for Data as a Service. Connect to your deployment space, and choose “Add to space”. Then select “Connection”, and the type of connection you want to create, in this case choose Cloud Object Storage “infrastructure”.

Some of the available connection types

Then fill in the credentials including URL, and test your connection.

Test the connection

Give it a name and save it.

After you reopen the connection, you can get its connection_idfrom the URL. You will need it to create the deployment job. The URL looks like:

https://dataplatform.cloud.ibm.com/connections/zxxxx_connection_id_xxxxxx?space_id=xxxxxx_space_id_xxxxxxx&context=cpdaas

This process can be automated in a script, and you can use the API described here to create connections programatically.

The call should be something like.

POST https://api.dataplatform.cloud.ibm.com/v2/connections/?space_id=xxxxxx_space_id_xxxxxxx
Headers:
>Authorization=[Bearer XXXXX]
>Content-Type=[application/json]
Entity:
{
"name" : "MYNAME",
"datasource_type" : "4bf2dedd-3809-4443-96ec-b7bc5726c07b",
"origin_country" : "us",
"properties" : {
"url" : "https://s3.us-south.objectstorage.softlayer.net",
"access_key" : "xxxxxxxx",
"secret_key" : "yyyyyyyyyyy"
}

Use connection in the deployment job creation

You are now ready to modify your deployment job payload to use the newly created connection.

You should modify your payload as shown below.

The connection_idof the connection is used in the connection section. Note that this is much safer than previous mechanism where credentials were inserted here. The information included here is useless without the corresponding hidden apikeys.

The bucket and file_name property are set (note that you need to use file_nameinstead of path) o point to the particular file you want to use in the Cloud Object Storage instance.

{
"id": "solution.json",
"type": "connection_asset",
"connection": {
"href": "/v2/connections/xxxxx_my_connection_id_xxxx?space_id=xxxx_my_space_id_xxxxxxx",
},
"location": {
"bucket": "test-lp",
"file_name": "solution.json"
}
}

Conclusions

This new mechanism has several benefits:

  1. the configuration is clearly separated into two parts: a reusable connection to the storage and the location of the data used by this connection,
  2. the connection can be created from API or using the User Interface and is stored in the corresponding deployment space,
  3. the credentials are only used to create the connection and the connection can then be used through its connection_id without the need to know the credentials,
  4. the range of storage that can be used is now much wider. For example, you can use a Planning Analytics connection and then use the cube and view names as locations.

For more stories about Decision Optimization in Cloud Pak for Data, follow me on Medium, Twitter or LinkedIn.

--

--

AlainChabrier

Former Decision Optimization Senior Technical Staff Member at IBM Opinions are my own and I do not work for any company anymore.