State of the Art Model Deployment

Deploying Machine Learning Models with Predictive Model Markup Language

Machine Learning (ML) is now used in many aspects of our lives. Some popular examples of ML applications include weather forecasts, ads on web pages, and voice services like Siri or Alexa. Companies are realizing that data science can bring great benefits to their business and as such, data scientists are in high demand. Machine Learning is at the heart of data science.

Photo by rawpixel on Unsplash

The normal life cycle of a machine learning model includes several stages, see Fig. 1.

Figure 1. The machine learning model lifecycle.

There are countless online courses and articles about preparing the data and building models but there is much less material about model deployment. Yet, it is precisely at this stage where all the hard work of data preparation and model building starts to pay off. This is where models are used to score (or get predictions for) new cases and extract the benefits.

My intent here is to fill this gap, so that you will be fully prepared to deploy your model using time tested resources. You’ll learn about an open standard and the state of the art of model deployment.

Solving the Deployment Problem

Model deployment can frequently cause a number of problems due to the fact that model building and deployment is often handled by different teams. Data scientists or statisticians typically build the models, while IT employees, webmasters, or database administrators are tasked with deploying them into production. Often those teams work in different environments, possibly on different continents, using different software products, with different programming languages, operating systems or file systems. Additionally, a model cannot be deployed just by itself. The data preparation steps applied to the training data before model building needs to be applied to the new data before model scoring. Often it is not easy to preserve and re-create those steps. Without standards, developers may have to reimplement the model in a different language before it could be deployed. This, of course, is a very slow and error-prone approach.

Fortunately, in the late 1990’s Professor Robert Grossman, then at University of Illinois at Chicago, organized the Data Mining Group (DMG), a group of companies working together on open standards for predictive model deployment. IBM was among the founding members of the group, and still remains a leading member. Another founding member was SPSS, a company started 50 years ago with a product called “Statistical Package for Social Sciences”. IBM acquired SPSS in 2009. I have been fortunate to work at SPSS and later IBM for more than 18 years and to participate in DMG since 2001. The first standard the group created was PMML, Predictive Model Markup Language. It is a standard for representing models in XML. In 1997 version 0.7 was released, but not much is now known about this version. Version 1.1 came out in 2000 and it had only six models. PMML has experienced tremendous growth since those early days. The last released version is 4.3, it includes 16 different models plus MiningModel element which allows different ways to combine several models by creating their ensembles and compositions. There is also a rich set of data transformations that can be expressed in PMML both for data preprocessing and for post-processing of model predictions.

Currently, more than 30 companies and open source projects support PMML.

Predictive Model Markup Language

PMML is primarily used to describe the input data, the transformations applied to it, and the model(s) using elements and attributes in XML defined by the PMML standard. This standard specifies those elements and attributes using XML schema, and the documentation (in HTML) describes the scoring procedure for each model, as well as some additional restrictions on the elements and attributes not captured by the schema.

Thus, once a model is built in one PMML-compliant product or open source package, it can be exported in PMML and then easily imported into another PMML-compliant application or open source package, potentially from a different company or project. This approach makes model deployment efficient and reliable.

Later in this article we will go over how to use two popular open source machine learning packages that can export PMML. Before diving into these open source options, we first have to discuss decision trees.

Building a Decision Tree Model

Decision trees have been used in ML for a very long time. A tree is a very simple model that provides easily explainable results. There are many algorithms for building a decision tree from data (e.g. C&RT, Chaid, QUEST, C5), but the resulting models are structurally the same or very close, and scoring does not depend on the way the tree was created. For many real life datasets a single decision tree cannot provide a very accurate model, but a combination (ensemble) of trees can. Random forest and XGBoost are two examples of accurate models built with multiple trees. Such models have been winning Kaggle competitions for many years now.

PMML has tools for supporting model ensembles and compositions. Here we will examine a single decision tree.

Using the Iris dataset built into open source R programming language and package rpart, we can build a decision tree model predicting the class of iris flower from the flower’s measurements. Then using package pmml created by my colleagues in DMG, we can export PMML for it.

The R code to export PMML is displayed below. I also included some slightly shortened output, the inputs are preceded by >.

> library(XML)
> library(pmml)
> data(iris)
> library(rpart)
> irisTree <- rpart( Species~., iris )
> irisTree
n= 150
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
2) Petal.Length< 2.45 50 0 setosa (1.000000 0.0000000 0.0000000) *
3) Petal.Length>=2.45 100 50 versicolor (0.000000 0.500000 0.50000)
6) Petal.Width< 1.75 54 5 versicolor (0.0000 0.907407 0.09259) *
7) Petal.Width>=1.75 46 1 virginica (0.00000 0.021739 0.97826) *
> saveXML( pmml( irisTree ), "IrisTree.xml" )
[1] “IrisTree.xml”

A similar operation can also be performed in Python. Many Python users prefer the SciKit Learn package for machine learning, and there is an open source project JPMML that provides PMML export for many models in that package. You can find example code for this in Python on StackOverflow.

First grow the tree:

from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
clf =,

Now export the PMML. SkLearn2PMML conversion takes 2 arguments: an estimator (our clf) and a mapper for data preprocessing. Our mapper is very simple, since we use no data transformations before model building.

from sklearn_pandas import DataFrameMapper
def_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])
from sklearn2pmml import sklearn2pmml

Several examples of PMML representations of various models can be found in PMML documentation and also via the Data Mining Group website.

Let’s consider some parts of PMML produced by the R code above. The root element of the XML document is PMML with attribute version containing the version of PMML. The latest released one is 4.3, now we are working on 4.4. Inside we can find Header, DataDictionary, TreeModel elements. The Header contains attributes copyright and description, as well as optional elements Application and Timestamp. DataDictionary describes the fields in the data used to build the model. For the iris data we have five fields, one categorical and four continuous. There is a DataField element for each data field, and the first one lists the valid categories of the categorical field Species:

<DataDictionary numberOfFields=”5">
<DataField name=”Species” optype=”categorical” dataType=”string”>
<Value value=”setosa”/>
<Value value=”versicolor”/>
<Value value=”virginica”/>
<DataField name=”Sepal.Length” optype=”continuous” dataType=”double”/>
<DataField name=”Sepal.Width” optype=”continuous” dataType=”double”/>
<DataField name=”Petal.Length” optype=”continuous” dataType=”double”/>
<DataField name=”Petal.Width” optype=”continuous” dataType=”double”/>

Element TreeModel, like any model element in PMML, must contain attribute functionName and element MiningSchema. According to the type of the machine learning model, the attribute functionName can have one of the following values: classification, regression, clustering, timeSeries, associationRules, mixed. MiningSchema tells what the model predictors (usageType=”active”) and targets (usageType=”predicted”) are and it can contain important information like missing value replacement and predictor importance.

<TreeModel modelName=”RPart_Model” functionName=”classification” algorithmName=”rpart”splitCharacteristic=”binarySplit” missingValueStrategy=”defaultChild” noTrueChildStrategy=”returnLastPrediction”>
<MiningField name=”Species” usageType=”predicted”/>
<MiningField name=”Sepal.Length” usageType=”active”/>
<MiningField name=”Sepal.Width” usageType=”active”/>
<MiningField name=”Petal.Length” usageType=”active”/>
<MiningField name=”Petal.Width” usageType=”active”/>

There are several other common elements that can be found inside any model. Output assigns names to the new fields generated by scoring and describes possible scoring results for the model:

<OutputField name=”Predicted_Species” optype=”categorical” dataType=”string” feature=”predictedValue”/>
<OutputField name=”Probability_setosa” optype=”continuous” dataType=”double” feature=”probability”value=”setosa”/>
<OutputField name=”Probability_versicolor” optype=”continuous” dataType=”double” feature=”probability”value=”versicolor”/>
<OutputField name=”Probability_virginica” optype=”continuous” dataType=”double” feature=”probability”value=”virginica”/>

Internal Elements of PMML

In addition, each model in PMML has its own internal elements that are necessary for a precise description of the model. For TreeModel, the main internal element is Node. Each Node must contain a predicate element and can also specify attributes id, score, and recordCount. It can also have child Node elements, corresponding to the structure of the tree model shown on Fig. 2.

Figure 2. A decision tree diagram for the tree built with R.

In the code featured below, Node 1 is the root node, its predicate is <True/> and it includes all 150 cases, with 3 classes of 50 cases each, as described in ScoreDistribution elements. Inside the Node element we see two child nodes, with id 2 and 3.

<Node id=”1" score=”setosa” recordCount=”150" defaultChild=”3">
<ScoreDistribution value=”setosa” recordCount=”50" confidence=”0.333333333333333"/>
<ScoreDistribution value=”versicolor” recordCount=”50" confidence=”0.333333333333333"/>
<ScoreDistribution value=”virginica” recordCount=”50" confidence=”0.333333333333333"/>
<Node id=”2" score=”setosa” recordCount=”50">
<CompoundPredicate booleanOperator=”surrogate”>
<SimplePredicate field=”Petal.Length” operator=”lessThan” value=”2.45"/>
<SimplePredicate field=”Petal.Width” operator=”lessThan” value=”0.8"/>
<SimplePredicate field=”Sepal.Length” operator=”lessThan” value=”5.45"/>
<SimplePredicate field=”Sepal.Width” operator=”greaterOrEqual” value=”3.35"/>
<ScoreDistribution value=”setosa” recordCount=”50" confidence=”1"/>
<ScoreDistribution value=”versicolor” recordCount=”0" confidence=”0"/>
<ScoreDistribution value=”virginica” recordCount=”0" confidence=”0"/>

Node 2 has predicate Petal.Length < 2.45 described by element SimplePredicate and contains all the cases for the target class setosa, as shown in ScoreDistribution. Additional predicates (surrogates) in the CompoundPredicate element are provided for the possible new cases that can have attribute Petal.Length missing. Node 2 is a leaf node, because all its cases already have the same target value, so if a new case falls into this node, the prediction is setosa with the highest confidence.

Node 3, on the other hand, contains all 100 cases belonging to the other two classes, so it has child nodes 6 and 7 that strive to provide better prediction accuracy. Their main predicates are based on comparing Petal.Width to 1.75. That split is not 100% accurate, because there is some overlap between the possible measurements of the two classes of iris flower, but still the accuracy is pretty good.

<Node id=”3" score=”versicolor” recordCount=”100" defaultChild=”7">
<CompoundPredicate booleanOperator=”surrogate”>
<SimplePredicate field=”Petal.Length” operator=”greaterOrEqual” value=”2.45"/>
<SimplePredicate field=”Petal.Width” operator=”greaterOrEqual” value=”0.8"/>
<SimplePredicate field=”Sepal.Length” operator=”greaterOrEqual” value=”5.45"/>
<SimplePredicate field=”Sepal.Width” operator=”lessThan” value=”3.35"/>
<ScoreDistribution value=”setosa” recordCount=”0" confidence=”0"/>
<ScoreDistribution value=”versicolor” recordCount=”50" confidence=”0.5"/>
<ScoreDistribution value=”virginica” recordCount=”50" confidence=”0.5"/>
<Node id=”6" score=”versicolor” recordCount=”54">
<CompoundPredicate booleanOperator=”surrogate”>
<SimplePredicate field=”Petal.Width” operator=”lessThan” value=”1.75"/>
<SimplePredicate field=”Petal.Length” operator=”lessThan” value=”4.75"/>
<SimplePredicate field=”Sepal.Length” operator=”lessThan” value=”6.15"/>
<SimplePredicate field=”Sepal.Width” operator=”lessThan” value=”2.95"/>
<ScoreDistribution value=”setosa” recordCount=”0" confidence=”0"/>
<ScoreDistribution value=”versicolor” recordCount=”49" confidence=”0.907407407407407"/>
<ScoreDistribution value=”virginica” recordCount=”5" confidence=”0.0925925925925926"/>
<Node id=”7" score=”virginica” recordCount=”46">
<CompoundPredicate booleanOperator=”surrogate”>
<SimplePredicate field=”Petal.Width” operator=”greaterOrEqual” value=”1.75"/>
<SimplePredicate field=”Petal.Length” operator=”greaterOrEqual” value=”4.75"/>
<SimplePredicate field=”Sepal.Length” operator=”greaterOrEqual” value=”6.15"/>
<SimplePredicate field=”Sepal.Width” operator=”greaterOrEqual” value=”2.95"/>
<ScoreDistribution value=”setosa” recordCount=”0" confidence=”0"/>
<ScoreDistribution value=”versicolor” recordCount=”1" confidence=”0.0217391304347826"/>
<ScoreDistribution value=”virginica” recordCount=”45" confidence=”0.978260869565217"/>

When a new case comes in for scoring, first its Petal.Length is checked, and if it is greater or equal 2.45, then Petal.Width is compared to 1.75, and the final prediction and probabilities come from the corresponding leaf node. Having the tree represented in PMML allows any PMML-compliant scoring engine to score this model correctly, no matter in what product or environment it is created.

Scoring Engines

Once PMML is obtained, it can be used for scoring by any PMML-compliant scoring engine. There are open source and commercial products for that. My personal favorite is in IBM SPSS Statistics, the product that started 50 years ago. Its modern incarnation is pretty popular, especially in academia and government agencies. In addition to traditional statistical analysis, it includes many other machine learning algorithms that I have had the honor to help develop. The “scoring wizard” can be found under Utilities menu, and has a convenient feature for specifying the mapping of PMML field names to the data field names, see Fig. 3 below. It comes handy when the data field names don’t match. For some reason, the Iris data set found in R, in Python, and in the UCI data repository all have slightly different field names, providing a nice example for this feature’s use.

Figure 3. Mapping field names in IBM SPSS Statistics Scoring Wizard.

Another product where you may want to deploy a model in PMML is IBM Watson Studio. You can get a free IBM Cloud account and sign up for Watson Studio. Then loading a PMML model is easy, see Fig. 4 below.

Figure 4. Loading PMML into Watson Studio.

Once a model is loaded, with one click we can create a model deployment, see Fig. 5 below. The bottom part of the page on tab “Implementation” has code examples for incorporating the deployed model into a web page or service using a number of popular languages.

Figure 5. Creating model deployment in IBM Watson.

If we go to the tab “Test”, we can test the scoring by manually entering values for the predictors, see Fig. 6 below.

Figure 6. Scoring a test case in a deployed model in Watson Studio.

On the left we enter the values for all predictors, then after pressing the “Predict” button we get the JSON shown on the right. This JSON tells us the prediction virginica and probabilities for all three target categories as 0, 0.022, and 0.978. When the model is integrated into an application using this Watson service, this is the form in which results are obtained. They can then be programmatically converted into the desired shape.

Eliminating the Black Box

The PMML format allows easy deployment of predictive models into new environments. Additionally, the human-readable format of PMML eliminates the “black box” feeling one may have regarding a model given to them in some binary format. It provides transparency and fosters best practices in model building and deployment.

In my future blogs I plan to talk about other model deployment formats, as well as examine algorithms and PMML examples for several other models, so stay tuned!