Basics of Machine Learning Models

Published in

K Means What?

5 min readMay 11, 2017

I’ve been spending the week at ServiceNow’s Knowledge17 user conference with 15,000 enterprise technology and process experts. There’s been plenty of buzz about artificial intelligence and machine learning based solutions for IT and enterprise service management. I’m here to help you understand the basic concepts without worrying about all the complexities behind these scary or overly generalized terms.

The last post covered a relatively basic machine learning solution for predicting an incident’s category to reduce the time and cost of manual triaging. Let’s take a look at the machine learning concepts needed to make these predictions. To recap, we need to take some initial incident information and predict a category so it can be routed and assigned to the best resource for resolving. We’ll do this using a machine learning model trained on historical incident data.

So What’s a Model?

You can think about the simplest of models as algorithms. Given some known inputs, classify the incident using a defined list of category options. There are many types of models and the degree of complexity is infinite. Classification models are typically used to solve this challenge. The type of model will dictate how a model is trained and tested. The following concepts generally apply to most situations.

Analysis and Data Grooming

Before we can start training a model we should review the historical incidents that we will be using as training data for two primary reasons. Both of these can be enhanced by a solid understanding of the data and the process or system that is generating the data. First, training should be based on clean and reliable data. Second, there should be an understanding of the field values and what they represent in the incident process and lifecycle.

How reliable is the historical data? Are all of the category values found in the historical incidents still valid choices? If some categories are no longer used we would probably want to exclude those records since they’d be of little value. The same goes for the generic catch-all categories like “Other”. There’s much more to this process but in general our goal is to ensure we have reliable data.

Each incident includes the field variable values (inputs) and the category field value when it was resolved (target output). Because we have true mappings between input and output values this is considered a supervised training example. I’ll save unsupervised concepts for another day.

Model Training

There’s one more step we need to take on the data. We need a way to test variations of models so we can identify which model performs the best. In order to test, we need to run predictions on some incidents that we already know the target category for. Without these known target values it would be like taking an exam and not having the answers to check your score. Instead of using all of the historical data for training, we’ll withhold some, say 20%. This will be used to test the performance of each model. Since the model test results are also used in an iterative modeling training process, we might also withhold another 10% that will only be used to validate models that the training process reports as the best performance. We’ll see why this is important in a bit. When splitting the source data into training, test, and validation sets it’s important that the splits are representative of the overall set. Stay tuned for future posts for ways various way to address this.

Feature Engineering

Enterprise IT incidents typically contain 50–100 fields. The majority of these most likely do not have any association with the incident’s category. Feature engineering is an iterative process paired with model training and testing. The objective is to identify the incident fields that are the most likely to be used to determine the category. The inputs to the model are referred to as features. Using the withheld test data we can estimate the performance of the model given a collection of features selected. Using metrics known as feature information gain determine which features provide the most impact on improving the model’s performance.

Feature selection can be a brute force process if you didn’t know anything about the data context. Having knowledge of the process generating the data allows the feature engineering process to be much more streamlined.

For example, if you’re involved in incident management you already have knowledge of the most obvious fields such as descriptions (with keywords), caller’s location/department/role, and impact. Having a solid understanding of how incidents are classified manually and what the target category values mean also helps.

Model Testing and Validating

How does the modeling process test each model to determine it’s performance and identify the optimal features? There are a large number of machine learning frameworks that can be used for model training. Most supervised model engines generally follow the same steps.

Train a model using the training data with known features and target values
Using the trained model, predict target values using the withheld test data set as input and compare results to the actual values
Calculate performance of the model
Iterate the training cycle using different features and weighting, compare performance with previous models
Identify the best model and run predictions against the withheld validation set and calculate the overall performance
Publish the model

What’s the difference between “testing” with test data and validation data? Even though the model is trained on the training data, the modeling process depends on the test data to work through the training process and determine which models are better or worse. This means the model may be heavily weighted toward the types of data in the training and test set, and may not be ideal for future predictions. Since we withheld the validation data until after model training, it should better represent overall performance of the model.

Once a model is deployed we need a feedback mechanism in order to capture these metrics. In the case of ServiceNow incident categorization prediction, we can identify correct and incorrect predictions by monitoring any changes in the category field or comparing the category prediction to the value when the incident was resolved. The changed and resolved values are then used to retrain and publish an updated model.

Don’t fear the black box. Understand the basics and you’ll be able to get more value from your machine learning solutions.

Stay tuned for….

K-fold validation

Overfitting and regularization

Recall, precision, and confidence thresholds

Identifying and tracking failed or incorrect predictions

Using feedback for ongoing model improvements

True positives, false positive, true negatives and false negatives