IT Support Ticket Classification using Machine Learning and ServiceNow

Karthik K Kandakumar
10 min readMar 16, 2019

--

Project Description and Initial Assumptions:

This project addresses a real life business challenge of IT Service Management. This is one of the known challenges in IT industry where alot of time is wasted in IT support ticket classification. The problem that the IT industry is facing right now can be described as follows:

1. In Helpdesk, almost 30–40% of incident tickets are not routed to the right team and the tickets keep roaming around and around and by the time it reaches the right team, the issue might have widespread and reached the top management inviting a lot of trouble. — this problem could have been solved easily if there was an automatic mechanism to route the incident ticket to the right category and right team — ML to the rescue!

2. Another far fetched goal can be automatic creation and classification of tickets that are registered on help desk via telephonic call. The frequency of the calls received at the help desk at the verge of a critical incident can be addressed by classification of all the calls to the same incident instead of creating multiple tickets which have same root cause.

3. An extension to the second use case can be a virtual bot that can respond and handle user tickets both on chat and on call.

However all the three use cases are very interesting, this project focusses on the first use case owing to time and resource limitation.

Tools and Technologies Used:

  • ServiceNow Instance( an IT Service management platform)
  • AWS Lambda function
  • EC2 Instance
  • Python
  • Neural Network models
  • 50000+ sample tickets from open sources

Process Overview:

The overall workflow of the process can be divided into various sub parts. A very high-level overview is shown in the figure below.

Fig. Workflow overview

Dataset:

The dataset is pulled directly from Service now. The initial data pre-processing includes cleaning of data (removing duplicates, removing empty rows, removing stop words etc.)

The original data that we got was unlabelled data and contained only the ticket descriptions. Since the approach selected for this project is classification we needed some mechanism to convert this unlabelled data to labelled data. One of the popular approach to this is Topic Modelling (described later in the article). Topic Modeling enabled us to select top 5 categories for our data which helped us creating the labeled data (Incident categories).

Integration between ServiceNow and AWS:

Web services make it possible for applications to connect to other software applications over a network allowing an exchange of information between the provider (server) and client (consumer).

A web service consumer (client) requests information from a web service provider (server). A web service provider processes the request and returns a status code and a response body. When the response body is returned, the web service consumer extracts information from the response body and acts on the extracted data.

ServiceNow can consume web services from third party providers or from another ServiceNow instance.

In our case, we used the endpoint url triggered by the API Gateway using Rest web services and accessed it using javascript that runs on creation of ticket.

On creation of a ticket, the javascript is triggered which sends the incident description to our model placed in AWS which performs the machine learning operations and returns back the predicted categories and probability.

Workflow execution
Web service integration

Future Scope:

Topic Modeling:

Before diving into the process, here are some of the assumptions we considered:

  • We pick the number of topics ahead of time even if we’re not sure what the topics are.
  • Each document is represented as a distribution over topics.
  • Each topic is represented as a distribution over words.

We used NLTK’s Wordnet to find the meanings of words, synonyms, antonyms, and more. In addition, we use WordNetLemmatizer to get the root word.

We then read our dataset line by line and prepare each line for LDA and store in a list.

LDA with Gensim

First, we are creating a dictionary from the data, then convert to bag-of-wordscorpus and save the dictionary and corpus for future use.

Fig. Implementing Genism

Next step was to find 5 topics using LDA. Below are various code snippets to very briefly give an overview of the process.

Fig. Python snippet for LDA
Fig. Getting topics from the descriptions

To get a better visualization of the topics we used pyLDAvis.

pyLDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

Visualization of the Topics:

Fig. Python snippet for pyLDAvis
Fig. Visual Representation of one of the five topics
Fig. pyLDAviz representation

From Topic Modelling we came to conclusion that the whole dataset can be divided into 5 categories: -

  • Network
  • User Maintenance
  • Database
  • Application Workbench
  • Security

We then labelled our dataset accordingly and prepared a dataset to perform supervised learning on it.

Model Selection & Training:

RNN for Classification:-

An end-to-end text classification pipeline is composed of following components:

1. Training text: It is the input text through which our supervised learning model is able to learn and predict the required class.

2. Feature Vector: A feature vector is a vector that contains information describing the characteristics of the input data.

3. Labels: These are the predefined categories/classes that our model will predict

4. ML Algo: It is the algorithm through which our model is able to deal with text classification (In our case : CNN, RNN, HAN)

5. Predictive Model: A model which is trained on the historical dataset which can perform label predictions.

Fig. Supervised Learning Model for Text

Incident Classification Using Recurrent Neural Network (RNN) :

A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a sequence. This allows it to exhibit dynamic temporal behavior for a time sequence.

Using the knowledge from an external embedding can enhance the precision of your RNN because it integrates new information (lexical and semantic) about the words, an information that has been trained and distilled on a very large corpus of data.The pre-trained embedding we used is GloVe.

RNN is a sequence of neural network blocks that are linked to each others like a chain. Each one is passing a message to a successor.

Fig. RNN Architecture

LSTM Networks

Long Short Term Memory networks — usually just called “LSTMs” — are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

Fig. Single tanh layer

The repeating module in a standard RNN contains a single layer.

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

The repeating module in an LSTM contains four interacting layers.

Our Architecture General Model:

Fig. Details of the model used
Fig. LSTM model

To use Keras on text data, we first have to preprocess it. For this, we can use Keras’ Tokenizer class. This object takes as argument num_words which is the maximum number of words kept after tokenization based on their word frequency.

Once the tokenizer is fitted on the data, we can use it to convert text strings to sequences of numbers. These numbers represent the position of each word in the dictionary (think of it as mapping).

  • In this project, we tried to tackle the problem by using recurrent neural network and attention based LSTM encoder.
  • By using LSTM encoder, we intent to encode all the information of text in the last output of Recurrent Neural Network before running feed forward network for classification.
  • This is very similar to neural translation machine and sequence to sequence learning.
  • We used LSTM layer in Keras to address the issue of long term dependencies.

Model scoring and selection:

Our model scoring and selection is based on the standard evaluation metrics Accuracy, Precision and F1 score

Output from the Model:

Fig. Validation accuracy on test and training dataset using RNN without LSTM
Fig. Confusion Matrix for RNN without LSTM
Fig. Fig. Validation accuracy on test and training dataset using RNN with LSTM
Fig. Confusion Matrix for RNN with LSTM

Observations:

The general RNN model without LSTM provided us the accuracy of 66% whereas we were able to increase it to 69% using LSTM network layer in RNN

The low volume of data resulted in accuracy being almost stagnant after 70% and it didn’t matter whether we increased the epochs as was evident in the plot.

Yet 69% was fair enough classification as we intend to train it online and continuously improve the accuracy as volume of data grows to higher degree.

Model Deployment:

For the scope of this project, we planned to deploy the model on Amazon AWS and integrate it with Service Now so that the model do the online or real-time predictions. To perform this we will first export the model by dumping it into an pickle file. Also, we will write a function which will connect to the S3 bucket and fetch and read the pickle file from there and recreate the model.

Fig. Python function to connect to S3 bucket

So the workflow looks like this:

  1. Create the incident in Service Now
  2. The incident is received in AWS and our AWS EC2 instance or service is running
  3. Fetch the function.py file from S3 bucket which will read the model from Pickle file and recreate the model
  4. It will extract the feature from the service request i.e. description of the incident
  5. Now, the code will be executed in AWS Lambda and it will provide us the category to which incident belongs

Details on the workflow:

Create an EC2 Instance on AWS: -

  • First, Create an AWS account which will give you free usage for 1-year on some limited services, but is enough for this project
  • Create an EC2 instance and select the free tier machine or if you have credits in your account and need a more powerful machine you can select from other options

Configure a virtual runtime environment for Python on AWS and once you are done then zip all the configurations files into a single file and also include the function.py file in the zip file as we will upload this file in AWS S3 bucket

Create an S3 bucket and upload your pickle file which contains model’s details like model name, hyperparameters, and weights of the model and also uploads the zip file which contains the function.py file along with configuration settings for python virtual environment

Fig. AWS bucket snapshot

Now, lastly setup AWS Lambda this is where we will run the python script and do the predictions

AWS Lambda is a compute service that let you run the code without any need of provisioning or managing the servers. It takes care of that part itself. The best part about AWS Lambda is that you pay only for the compute time you consume — there is no charge when your code is not running. With AWS Lambda, you can run code for virtually any type of application or backend service — all with zero administration. AWS Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. All you need to do is supply your code in one of the languages that AWS Lambda supports. Though we were not able to configure the AWS lambda as none of us was familiar with it and have to go through the documentation and fell short of time. We are planning to extend the project for our other course and complete it. We will update the blog once we achieve our goal.

This project is not yet completed as we were short of time. We are working on this project and we plan to take this project ahead so that we can deploy our code on AWS and integrate it with Service Now. We will update the blog as soon as we make any progress. :)

Resources:-

https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f

https://medium.com/datadriveninvestor/automation-all-the-way-machine-learning-for-it-service-management-9de99882a33

https://github.com/karolzak/support-tickets-classification

By: Pankaj Kishore, Anuja Srivastava, Jitender Phogat and Karthik Kandakumar

--

--