Photo originally taken by me on my visit to Las Vegas

Illustrating AI/ML Model Development in Cloudera Machine Learning

Afzal Muhammad
The Startup
Published in
7 min readDec 2, 2020

--

Build, Deploy, and Access a Model in Cloudera Machine Learning (CML)

Overview

Cloudera Machine Learning (CML) is a purpose-built AI platform in Cloudera Data Platform (CDP) private cloud. CML provides an end-to-end machine learning platform for enterprises mainly because it leverages components of CDP in order to deliver a truly seamless experience from data exploration to modeling and launching ML models into production. CML is fully integrated with CDP by-design to provide a consistent experience with secure, shared business data across hybrid and multi-cloud environments.

Cloudera Machine Learning enables:

  • Easy onboarding of a new tenant and provision an ML workspace in a shared Red Hat OpenShift Container Platform environment.
  • Enable data scientists to access shared data on CDP Private Cloud Base and Cloudera Data Warehouse.
  • Leverage Spark-on-K8s to spin up and down Spark clusters on demand.

In this tutorial, I would be illustrating how easy it is to create a model and use it in CML on Cloudera Data Platform Private Cloud.

In this article, I will be identifying our own handwritten digit. However, in order to accurately predict what digit is it, learning has to be performed first, so that it understands the different characteristics of each digit as well as the subtle variations in writing the same digit. Thus, we need to train the model with a dataset of labelled handwritten digit. This is where MNIST ready to use dataset comes handy.

MNIST dataset is comprised of 60,000 small 28x28 square pixel gray scale images and 10,000 test images. These are handwritten single digit images from 0 to 9. Instead of downloading it manually, we can download it using Tensorflow Keras API

Launch Cloudera Machine Learning

Launch CDP private cloud by clicking the Private Cloud → Kubernetes Namespace in Cloudera Manager as shown below.

Login to CDP Private Cloud by providing the credentials

Select Machine Learning from Cloudera Data Platform (CDP) home page:

In the ML Workspaces section, select already provisioned Workspace such as cml-01 in this case as shown in the below figure. Or click “Provision Workspace” button to provision a new workspace. Provisioning a workspace is beyond the scope of this article.

Select already created project or click “New Project” to create a new project as shown below

Create a Model

Now that we have a working environment, let’s create a session in our project.

Create a New Session by clicking “New Session”

We can either create a Workbench or Jupyter Notebook session. Here we will be creating a Workbench with Python 3 as Kernel. Specify a Resource Profile. we have already created engine profiles under Admin → Engines → Engines Profiles section in CML. You can also proceed with the default Resource Profile if you don’t have any Engines Profiles setup already.

Open a terminal window by selecting, >_ Terminal Access and type:

# sh cdsw-build.sh

Contents of cdsw-build.sh are as follows:

This will install the dependent libraries needed for the project such as sklearn, pandas, numpy, and tensorflow. Once it is completed, close the terminal window.

NOTE: You only need to install dependent libraries once — this step can be skipped in future sessions.

Below is the python code that performs building, compiling, training, and saving the model based on MNIST dataset. it is important to note that saved model file should be accessible from our model wrapper which will be used for adding this model to the project.

Run the entire program in Workbench. Now that we’ve created our model, we no longer need this session — select “Stop” to terminate this session as shown below.

Now create the following python file: named as Train-digit-model-wrapper.py We will be using this code for model deployment as REST API to serve predictions.

In K8 or in OpenShift terminology, this deployment will appear as K8 Service as shown in the below figure from Red Hat OpenShift Container Platform

Add Model to Project

Now build the model in CML. Click Model, then Click “New Model” button.

In the create model screen, complete the form with the following:

Name: PredictDigit-001Description: Predict Handwritten DigitDisable AuthenticationIn the Build section.File: Train-digit-model-wrapper.pyFunction: PredictFuncExample Input: {“pixelarray”: “0,0,0,0,0,0,0.03137254901960784,0.5764705882352941,1,1,1,1,1,1,0.17254901960784313,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.011764705882352941,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.984313725490196,0.07450980392156863,0,0,0,0,0,0,0,0,0,0,1,1,1,0.6196078431372549,0,0,0,0,0,0,0,0,0,0,0,0.8980392156862745,1,1,1,1,0,0,0,0,0,0,0,0,0.48627450980392156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.8470588235294118,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0784313725490196,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.07058823529411765,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7098039215686275,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7568627450980392,1,0.7843137254901961,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.3176470588235294,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.4470588235294118,1,1,1,0,0,0,0,0,0,0,0,0,0.8980392156862745,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.03137254901960784,0,0,0,0,0,0,0,0,0,0,0,0,0,0.06274509803921569,0.07058823529411765,0.07058823529411765,0.803921568627451,1,1,0.8941176470588236,0.07058823529411765,0.06666666666666667,0,0,0,0.0392156862745098,1,1,1,0.5098039215686274,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.9254901960784314,1,0.6901960784313725,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5254901960784314,1,0.07450980392156863,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.9254901960784314,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.3803921568627451,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24705882352941178,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.34509803921568627,1,0,0,0,0,0,0.9803921568627451,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0.3686274509803922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0.9882352941176471,1,1,1,1,0.6784313725490196,0.08235294117647059,0,0,0,0,0,0.4235294117647059,1,1,1,1,1,1,1,0.3333333333333333,0,0,0,0,0,0,0,0,0,0.10588235294117647,0.8117647058823529,1,1,1,1,1,1,1,1,1,1,1,1,0.21568627450980393,0.12941176470588237,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0”}Example Output: {“result”:”3"}
Kernel: Python3
Engine Profile: Default or custom engine profile pre-configured
Replicas: 1

Click “Deploy Model”

Once the Model get deployed, click the model named PredictDigit-001 as shown below.

Copy the command from Overview →Shell tab as shown below and run it.

Below figure shows the output of the curl command.

You can also test the model by clicking the “Test” button in the model Overview tab

Canvas for generating image pixel array and predict

Below is the html code that offers you a canvas to draw a digit. Copy and paste the below code into a notepad or any plain text editor and replace the following:

<accesskey> with your accesskey
<hostURL> with your hostURL

You can find access key and host URL in model Overview tab as shown below

Save this file with .html extension on your computer and then launch it in the browser locally.

Draw your digit. Then click “Convert” button to convert the image into image pixel array as shown in the below figure. You can click “Copy Image Pixel Array in Clipboard” button to copy the pixel array into your clipboard which can be used for testing in the curl command or in testing the model in Overview tab.

Now click the “Predict” button. Inside the html code, javascript will submit the POST request to the API with access key provided earlier and with the generated image pixel array and fetch the response as shown below.

You can also watch this demo

Below is the html code.

If you have also copied the pixel array in your clipboard, replace the pixel array in the curl command with this new pixel array stored in the clipboard or in the test model input box as shown in the figure.

Output from the curl command is shown below. it shows {“success”:true,”response”:”4"}

The interactive microservice

The final part to this project is hosting the web application and this is where we will use the Applications feature from CML. The Applications will run and serve a long running web based application with a permanent URL. This application is set up so that it can be accessed by any users who have network access to the CML instance. Create the following python file that will be using Flask framework.

Now create Application in CML. Click Applications →New Application. Fill up the form with the following:

Name: MNIST App
Subdomain: mnistapp
Script: mnist_ms.py
Engine Kernel: Python 3
Engine Profile: Default or your preconfigured

Click “Create Application”

This will create an application as shown below.

Launch the application by clicking the Name. Application will be be launched in the browser.

Now call the predict method as below:

http://mnistapp.ml-f81b3d54-323.apps.ocp4.sjc02.lab.cisco.com/predict?pixelarray=0,0,0,0,0,0,0.03137254901960784,0.5764705882352941,1,1,1,1,1,1,0.17254901960784313,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.011764705882352941,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.984313725490196,0.07450980392156863,0,0,0,0,0,0,0,0,0,0,1,1,1,0.6196078431372549,0,0,0,0,0,0,0,0,0,0,0,0.8980392156862745,1,1,1,1,0,0,0,0,0,0,0,0,0.48627450980392156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.8470588235294118,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0784313725490196,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.07058823529411765,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7098039215686275,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7568627450980392,1,0.7843137254901961,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.3176470588235294,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.4470588235294118,1,1,1,0,0,0,0,0,0,0,0,0,0.8980392156862745,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.03137254901960784,0,0,0,0,0,0,0,0,0,0,0,0,0,0.06274509803921569,0.07058823529411765,0.07058823529411765,0.803921568627451,1,1,0.8941176470588236,0.07058823529411765,0.06666666666666667,0,0,0,0.0392156862745098,1,1,1,0.5098039215686274,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.9254901960784314,1,0.6901960784313725,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.5254901960784314,1,0.07450980392156863,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.9254901960784314,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.3803921568627451,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24705882352941178,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.34509803921568627,1,0,0,0,0,0,0.9803921568627451,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0.3686274509803922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0.9882352941176471,1,1,1,1,0.6784313725490196,0.08235294117647059,0,0,0,0,0,0.4235294117647059,1,1,1,1,1,1,1,0.3333333333333333,0,0,0,0,0,0,0,0,0,0.10588235294117647,0.8117647058823529,1,1,1,1,1,1,1,1,1,1,1,1,0.21568627450980393,0.12941176470588237,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Now we have a permanent URL which can be used by anyone to access the application.

Conclusion

With Cloudera Machine Learning, we can accelerate machine learning projects from research to production and can manage the complete lifecycle. A typical machine learning project will include the following high-level steps that will transform a loose data hypothesis into a model that serves predictions.

  1. Explore and analyze data interactively in workbench.
  2. Deploy automated pipelines using Jobs in CML.
  3. Train and evaluate models with Experiments in CML.
  4. Deploy models as REST APIs to serve predictions.

Demo

--

--

Afzal Muhammad
The Startup

Innovative and transformative cross domain cloud solution architect @Microsoft (& xCisco). Helping companies to digitally transform!!!