Decision Optimization, Now Available in Watson Machine Learning

AlainChabrier
8 min readJul 29, 2019

--

Earlier this year, we announced the availability of Decision Optimization in Watson Studio, allowing you to develop, debug and tune optimization models within a powerful Data Science platform. We are now announcing the availability of Decision Optimization in Watson Machine Learning, allowing you to deploy and execute optimization models from production applications.

In this tutorial, we detail several aspects of this integration.

Decision Optimization vs Machine Learning

I have already published quite a lot of materials to illustrate the differences and similarities between Decision Optimization (DO) and Machine Learning (ML). See for example this post using a simple example.

Decision Optimization is used in many different industries, to solve lots of different problems ranging from car production planning to airline crew scheduling or from electricity unit commitment to predictive maintenance.

Decision Optimization models are not data-driven and hence cannot be created automatically from historical data. DO is a knowledge-driven set of techniques and, just as with Business Rules systems, the constraints and objectives which apply to the input data and to the decision variables need to be formulated. When the model is created, debugged, tuned and validated, for example inside Watson Studio, it can be deployed and used for new sets of input data from production applications. The model is data independent. The model will only have to be changed if some business considerations change (for example a new constraint or new objective).

Production applications can thus use some simple entry point to provide new data, which will be attached to the model to create a problem, and then solve this problem using mathematical engines. When the optimization is complete, the application will access the solution.

All this is now simple to do with the addition of Decision Optimization to Watson Machine Learning.

Watson Machine Learning

Watson Machine learning offers an infrastructure to deploy and execute different types of models in a scalable way, providing simple REST API that can easily be called from all kinds of applications.

Among other benefits, WML reduces the effort to set up some infrastructure to score your models. You have nothing additional to do when your needs grow.

Also WML provides a wide set of different runtimes to execute different types of ML frameworks.

Go to these pages to know more about the general functionality of WML.

Use DO with WML

Let’s see various topics of interest as DO is now available in WML.

Deployment flow

The overall flow of creating, deploying and using a decision optimization model is shown below.

Complete workflow form development to deployment and use

I will not detail here again how models can be developed and debugged in Watson Studio: models can be created using notebooks or using a dedicated model builder.

When the model is ready to be used in production, it can be uploaded and deployed in a WML instance.

Note that you could use any other tool you might prefer to develop your optimization model, or you can just use WML to deploy and run your legacy models.

Create a WML instance

The first thing you will need to deploy and run optimization models is a WML instance. You can deploy different models in the same instance or in different instances, allowing you to better control costs.

You can follow step-by-step creation of an instance here.

Get the credentials from your WML instance

Security and authentication

Access to your WML instance is secured through an instance id and an api key which are used to obtain a token that must be passed in all successive WML REST API calls. You can see a more detailed example of how to do this at the beginning of this documentation.

Get token and runtime.

Deploying DO models, runtimes and model types.

A model formulation which has been validated in Watson Studio can then be deployed in this WML instance.

Three steps are required:

  1. create a model in WML
  2. upload the model formulation (and optionally some master data)
  3. deploy the model

Models are created, deployed and run on a given runtime, so a runtime has to be provided. Currently you can only select the 12.9 runtime, which provides the same mathematical engines as CPLEX Optimization Studio 12.9.

When you create the model, you also need to provide the model type. Currently the following types are supported :

  • opl (do-opl_12.9)
  • cplex (do-cplex_12.9)
  • cpo (do-cpo_12.9)
  • docplex (do-docplex_12.9) using Python V.3.6
Create a model

Models are uploaded as a zip binary zip file, containing the model formulation. It can also contain some master data, i.e. some data that is common to all instances and will be used for all executions. Then you will not need to attach this data to every single new job. This is a common situation with optimization that some structural data, for example the supply chain nodes and arcs is fixed over time and will be the same for all executions, but that some other data, for example the product inventories and demands, are different at each execution.

Upload the model.

Finally, you can deploy the model to get the deployment id to be used for executions. The meaning of compute nodes and names will be discussed later.

Deploy the model.

After a model is deployed, problems can be solved with as many sets of data as needed. When a model is not useful anymore, the deployment can be deleted.

There is no charge for deploying a model, only the solving of a model with some data is charged, based on the running time. You can deploy several models on the same WML instance.

Asynchronous jobs

An important difference between scoring deployed ML models and solving deployed DO models is the running time.

Scoring one row of input data is very fast so that most scoring API are synchronous, returning the score in the response of the REST API call.

With DO, problems can take minutes or large problems can take even hours to run, so that in general, asynchronous APIs are required. This has been implemented with jobs in WML v4 APIs.

So in order to solve a problem, you can just create a new job from a model deployment and associate it to some data.

Create a job

Then you will be able to poll its status and when the problem is solved, access the solution.

You can create and submit several jobs at the same time for the same deployed model on the same WML instance. See below how the runtime handles these situations.

Multiple ways to connect to input (and output) data

Data can be provided inline, in the payload of the job creation.

Example of inline data.

Data can also be passed by reference and the current implementation allows different sources of data including COS/S3, as shown below. The use of referenced data allows cleaner architecture where the persistence of data is handled by a different piece of code.

Examples of referenced data usage.

While the job is running, you can access the status using the following call:

Monitor a running job

Executing jobs on PODs

When a job is created and submitted, the way it gets executed depends on the current configuration and jobs running for the WML instance, as shown in the diagram below.

Jobs are run inside PODs. A Kubernetes POD consists of one or more containers that are guaranteed to be co-located on the host machine and can share resources.

How new jobs are handled.
  1. the new job is sent to the queue,
  2. if a POD is up but idle (not running a job) it immediately starts processing this job.
  3. otherwise, if the maximum number of nodes is not reached, a new POD is started (this can take a few seconds), and the job is assigned to be processed by this new POD,
  4. otherwise, the job waits in the queue until one of the running PODs has finished and can pick up the waiting job.

There are different types of PODs that can be used, Small, Medium or Extra Large as configured when deploying the model.

The configuration of PODs of each size is as follow:

  • Small : 2 CPUs and 8 GB
  • Medium: 4 CPUs and 16 GB
  • Extra Large: 16 CPUs and 64 GB

In addition to the solving time, the pricing will depend on the selected side through a multiplier.

In the deployment configuration you can also set the maximal number of nodes to be used.

Idle PODs are automatically stopped after some timeout.

Running time based pricing (CUH)

As commented above, only the job solving time is charged, i.e. the idle time for PODs is not charged.

Depending on the size of the POD used, a different multiplier will be used to compute the number of Compute Units Hours (CUH) consumed.

REST API documentation

All REST APIs introduced above are documented with more details in our Decision Optimization for Watson Machine Learning documentation.

Python API with example

In addition to the REST APIs, a Python API is provided with WML which allows you to easily create, deploy and consume a decision optimization model form a python notebook.

An example notebook describing and documenting all steps is included in the Watson Studio community.

Conclusions

With the support of Decision Optimization model deployment in Watson Machine Learning, it is now possible to embed optimization models in production applications and get problems solved using a powerful and scalable infrastructure.

Alain.chabrier@ibm.com

@AlainChabrier

https://www.linkedin.com/in/alain-chabrier-5430656/

--

--

AlainChabrier

Former Decision Optimization Senior Technical Staff Member at IBM Opinions are my own and I do not work for any company anymore.