Machine Learning in Digital Process Automation — Part I

Published in

Oracle Developers

11 min readJul 17, 2019

This is the first of a multi-part series of articles on how to use Machine Learning (ML) in Digital Process Automation and Integration applications in Oracle Integration Cloud. For this series, we’re using a mix of Oracle Cloud Services and implement a couple of use cases step by step. The first article (this one) focuses on setting up the Machine Learning environment and implementing the Machine Learning Model for our first use case.

Example Use Case

Approval Workflows are very popular examples of Process Automation applications since approvals are widely used in any organization, for example

Approve hiring of employee in Employee Onboarding process
Approve purchase of equipment in Procurement process
Approve travel expenses in Travel Management process
etc. etc.

The list can go on and on. For this series though, we take a use case from Sales and want to implement the following scenario

Orders from customers are coming in from Sales Reps of a company
Orders have, among other data, the following relevant information
- Amount of the order
- Quarter when the order was created (1, 2, 3, 4)
- Customer Status (green, yellow, red)
- Requested Discount by the customer (0–99%)
Depending on the order information, we want to predict whether the order should be approved by a Sales VP for the given quarter.

This is a typical situation for any company where Sales Reps bring in orders from customers and a Sales VP has to decide, if the order should be approved for this quarter or not. Especially towards a quarter-end, this can become quite stressful for the Sales VP, so some automation here would improve the Sales Process significantly and could help to handle more orders.

In this article, we will demonstrate how to set up the Machine Learning (ML) environment using Oracle Autonomous Data Warehouse and implement a ML Model for our use case.

Oracle Autonomous Data Warehouse

Oracle Autonomous Data Warehouse (ADW) is an Oracle Cloud Infrastructure native service. It is a fully-managed cloud service with machine learning and artificial intelligence capabilities that makes it very simple to provision a data warehouse, quickly and easily load data and query that data using built-in web-based tools such as notebooks. Apache Zeppelin based ML notebooks together with Oracle Advanced Analytics offers a comprehensive ML toolkit for the Data Scientist to load and visualize data, train, and test ML models and document the ML use case using simple markdown language.

Initial User Setup

For the setup and administration of ADW, see the Appendix of this article. It is advised to set up users in ADW. A user in ADW is different from a OCI user in the sense that a ADW user is really a Database user owning a schema in ADW. Once a user is created in ADW, the user might want to run few setup SQL scripts. Those are required only once in the system and can then be used in every Notebook or SQL script. See Create User Account in Oracle Machine Learning on how to create users in ADW. To create a SQL script in ADW, login with your user and password and then open Run SQL Scripts from the users home page

This will open a SQL Script Scratchpad where you can organize your SQL Scripts in paragraphs and then execute each paragraph individually or execute the entire Script Scratchpad. I’d recommend organizing the work in paragraphs so that you can come back later and only execute appropriate paragraphs as needed.

Create Credentials

Since we’re working with Oracle Cloud Storage later to load data from Cloud Storage into ADW, we need to setup user credentials in ADW to connect to Cloud Storage with procedures of ADW package DBMS_CLOUD. To do this, open a SQL Script in ADW and execute the following

Create Credential to connect to Cloud Storage from ML notebooks

See the DBMS_CLOUD.CREATE_CREDENTIAL procedure and Working with Auth tokens for details.

Create Procedure to write BLOB to DATA_PUMP_DIR

Next we’d like to create a convenience procedure that can be used to dump a serialized ML model as a binary BLOB to the DATA_PUMP_DIR of ADW. We give an explanation later on why this is needed. For this add the following to your SQL Scratchpad and execute it

Procedure to write a BLOB to a directory on ADW Database

Implementing the Sales Approval Machine Learning Model

We’re now ready to implement the use case using a ML notebook in ADW. For this, go to the home page and click on Notebooks, then click on Create

This will open a new empty ML notebook that looks like the following:

Empty ML notebook for Order Approvals Classification

ML Notebook usage

The ML notebook is organized in so-called paragraphs. Each paragraph is implemented by one of the following implementation types

%md
This is for mark-down language and can be used to document the ML notebook. Links in the mark-down can be used to link to external resources.
%script
This is for SQL scripts in PL/SQL language. In a later version of ADW support for Python will be added.
%sql
This is used for SQL Queries and also allows visualizations of the result set in various forms of charts etc.

The implementation type should be specified in the first line of each paragraph. Paragraphs can be executed individually. Also, the entire ML notebook might be executed. For this, there is a small Play button on the notebook and on each paragraph. For more information about ML notebooks and their use, please see Using Oracle Machine Learning. It might be also useful to read about Apache Zeppelin notebooks in general to better understand their usage and best practices.

The sections below are organized in a way, that you can copy&paste the sample code into new paragraphs of your ML notebook. By the end of this paragraph, you will have a running ML notebook that creates a ML model for classification of orders.

1. Documentation

It is good practice to document your use case, for this we start the ML notebook with a mark-down (%md) paragraph

Paragraph to document the use case

2. Cleanup

Next, we’re going to cleanup any existing data in the context of this use case. This will make the ML notebook repeatable.

Cleanup data for this use case

3. Create Database Objects

After the cleanup, we’re going to create the required objects in the ADW database.

Create tables required for this use case

4. Load Data from Cloud Storage

Next, we’re going to load data from Cloud Storage into table SALES_DATA. This table contains the training set on which we train the ML model later. ADW offers various options to load data into the ADW Database, see Loading Data with Autonomous Data Warehouse for details. In our use case, the data is available as a .CSV file on Cloud Storage and we can load it into the Database using DBMS_CLOUD.COPY_DATA procedure

Copy data from .CSV file on Cloud Object Storage to table SALES_DATA

Here’s the file sales_data_export.csv on Cloud Object Storage

5. Split Data

For any Machine Learning exercise it is good practice to split the data into two parts

Training Data
The Training Data is used to actually train a Machine Learning Model
Test Data
The Test Data will be used later to test the quality of the ML Model and check if the predictions give results that are identical or close to the test data

This can be conveniently achieved in SQL by creating two views on top of table SALES_DATA, one for the training data and the other for the test data

Split SALES_DATA into training and test data by using views

A note of warning here: While above SQL is quite convenient and expressive, this split methodology shouldn’t be used for large training sets because the SQL MINUS operation is not really performant. Instead, one should use the case id in combination with the ora_hash function to split the data set.

6. Prepare settings for GLM algorithm

For this use case, we’re going to use a Generalized Linear Model (GLM) ML algorithm, which is an algorithm for Classification models. In order to use GLM in ADW, we need to parametrize the settings of the algorithm. Auto data preparation is on by default and doesn’t need to be set explicitly. It is just shown below for completeness.

Settings for GLM algorithm

7. Show Data

We’re now ready to train a Machine Learning Model for the data stored in table SALES_DATA using a GLM algorithm for Classification. Before we do that, we query table SALES_DATA and show some graphics. For this we can create a new paragraph and select %sql as implementation and then add the following SQL into the paragraph

%sqlselect CustomerStatus, Quarter, OrderAmount from sales_data_train

This will display something similar as below after the user selects to use a bar chart visualization for the query result.

Example bar chart from data in view SALES_DATA_TRAIN

Feel free to play with the visualization and various chart implementations available in the notebook. Above bar chart shows the distribution of order amounts over the four quarters for customers with status green, red and yellow.

8. Train Machine Learning Model

Now its time to train the ML model based on data in view SALES_DATA_TRAIN

Creation of ML model SALES_CLASSIFICATION_GLM_V2

This needs probably a bit more explanation, for the full details please see DBMS_DATA_MINING.CREATE_MODEL documentation and the Oracle Advance Analytics Data Mining User’s Guide.

We create a model with name SALES_CLASSIFICATION_GLM_V2
The mining function for this model is CLASSIFICATION
The data for training is stored in view SALES_DATA_TRAIN
The target column (the attribute we’d like to predict later) is APPROVAL
The settings for the algorithm are stored in table GLM_SETTINGS

9. Show Prediction Results

Now that we have build the model, we can check the quality of the model. For this article, we just show the results of some predictions against view SALES_DATA_TEST though.

Create a new paragraph in your ML notebook with the following

%sqlselect prediction(SALES_CLASSIFICATION_GLM_V2 USING *) pr,
       Quarter,
       OrderAmount
  from sales_data_test
  order by pr ;

which might show something like this

Result of predicting APPROVALS using data from view SALES_DATA_TEST

As somehow expected, most of the predictions are ‘yes’ for the APPROVAL attribute and only a fraction of those are ‘no’. In a real world scenario, one would build many more Classification models using different algorithms and then compare the result of each algorithm and then pick the best algorithm for the use case. However this would go far beyond the purpose of this series and might actually be a series of its own.

ADW offers numerous ML techniques and algorithms for the following

Classification
Regression
Anomaly Detection
Clustering
Association
Time Series
Feature Selection and Regression

For a complete set of algorithms please see Algorithms.

10. Write ML model to directory

Now that we have build a ML model for the use case we can write its serialized version as a BLOB to a directory in ADW. This paragraph will make use of procedure write_serialized_model that we created initially at User Setup. This procedure can be used in other ML notebooks so we have not created it here but as part of setting up the Data Scientist ML notebook environment.

Writing serialized ML model to ADW DATA_PUMP_DIR

11. Transfer ML model to Cloud Object Storage

As a final step of this ML notebook, we’re going to transfer the serialized version of the ML model from directory DATA_PUMP_DIR of ADW to Cloud Object Storage. For this we’re using DBMS_CLOUD.PUT_OBJECT procedure.

Transfer serialized ML model from DATA_PUMP_DIR to Cloud Object Storage

We’re now done with the ML notebook. The result of running this ML notebook is a Machine Learning Model SalesClassificationV2.mod stored on Cloud Object Storage. In the next part of this series we’re going to show how to use this Machine Learning Model for scoring.

Appendix: Autonomous Data Warehouse administration

In this chapter we highlight few administration features of ADW that might be valuable to work with an ADW instance.

ADW DB Connections and Service Console

ADW offers certain administration features that can be accessed from the OCI Autonomous Data Warehouse console. For this, an Administrator of ADW might want to select the appropriate ADW instance from the OCI console as shown below

List of Autonomous Data Warehouse instances

Downloading Client Credentials

It is quite convenient to have the full set of features from SQL*Developer available to work with the ADW Database. In version 18.1 and above, SQL*Developer enables the creation of Database Connections from client credential files, also known as the Wallet. To download the Client Credentials for the ADW database, open the ADW instance details page in OCI console and then click on DB Connection

Details of Autonomous Data Warehouse Instance

This will open another dialog from where you can download a .zip archive

The .zip archive contains the client credentials and required keys to connect to the ADW Database via JDBC over TLS. This archive can be used from SQL*Developer to create a Database Connection.

Service Console

The Service Console shows some runtime statistics of the ADW instance with CPU utilization and average SQL statement response times and some more.

All administrative tasks of ADW can be reached via the Administration menu from the Service Console

In summary, the Administration Console allows an Administrator of ADW to perform the following operations

Download Client Credentials
This is essentially the same what you can do from the Instance Details page of the ADW instance and is documented above
Set Resource Management Rules
This allows you to set compute resource limits for the ADW Consumer Groups HIGH, MEDIUM and LOW
Set Administrator Password
Manage Oracle ML Users
Allows an Administrator of ADW to add users to Autonomous Data Warehouse. Those users are essentially Database Users (and not users managed by OCI Identity Management). In order to use the ML notebook capability of ADW, a user should be created
Download Oracle Instant Client
This will redirect to the Oracle Instant Client Download page from where you can download an Oracle Instant Client for you platform and operating system of choice. The instant client gives you access to your ADW Database via prominent Oracle Client tools like SQL*Plus, imp/exp, etc.
Send feedback to Oracle

Acknowledgements

This article series is the result of a more than one year collaboration with the Oracle Machine Learning (OML) Team. So I’d like to thank Boriana Milenova and her development team for the great support, timely delivery and valuable conversations about Machine Learning. Thanks to Product Manager Charlie Berger from @OracleAnalytics for promoting OML and Integration Cloud and the opportunity to present this work at various occasions. Last not least thanks to Suman Ganta from Process runtime team and the Integration Cloud UI team for enabling OML in Integration Cloud.