ASSIGNMENT GROUP PREDICTION FOR TICKETING TOOL — SERVICENOW

Anurodh Mohapatra
Analytics Vidhya
Published in
7 min readJan 3, 2021

--

1. Problem statement

Almost every big and mid-sized organization utilities a support ticket system. ServiceNow is one of the most commonly used ticket systems across all organizations.

A support request is raised by users in service now for any hardware or software-related issues-like procurement of new desktop, creation of open office 365 account, etc. These tickets are then assigned to the appropriate technical support team by the L1 engineer.

Our aim is to automate the work of this L1 engineer and to assign the tickets to the appropriate technical support team.

2. Data collection

Now coming to 2nd step i.e. data collection. And we are lucky here as ServiceNow has the functionality to export data in the form of CSV files.

  • 1st we will remove all filters. Right-click on the hamburger icon →select filters →None.
  • Now right click on the hamburger icon above any column →Export →CSV.

That’s it. We have our data.

3. Data Cleaning

# To read and manipulate dataframe
import pandas as pd
# For EDA
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud, STOPWORDS
stopwords = set(STOPWORDS)
%matplotlib inline

For starting I have imported some modules, and later we can import more when required.

As data is in tabular form, I am using Pandas to read and manipulate our data.

For EDA I have used matplotlib.pyplot, seaborn, and wordcloud.

Also, I have created a variable called stopwords which will store all English stopwords such as the, is, etc.

%matplotlib inline command will help to show all the charts in the notebook itself.

Now we need to read our data.

incident_df = pd.read_csv('incident.csv', encoding='ISO-8859-1')

I have used pd.read_csv to read CSV files and given filename and encoding type as parameters. My file is present in the root directory, and if you have it in some other location, then you can mention the full path.

Once we have our data ready now we need to clean it I Have kept only three columns that are short description description and assignment group and dropped the rest of the columns which is not required.

df1 = incident_df[[‘Short description’,’Description’,’Assignment group’]]

First, we need to check missing values for that I have created a function that takes the dataframe as an input and will return count and percentage of missing in the form of dataframe.

# Function to check and create dataframe for missing valuedef check_miss(df):'''This function will check number and % of missing value in each columnif it is more than 0 then it will return a dataframe'''#Column which have missing valuemiss_col = [col for col in df.columns if df[col].isnull().sum()>0]#DataFrame that contains no. and % of missing valuemiss_df = pd.DataFrame([df[miss_col].isnull().sum(),df[miss_col].isnull().mean()*100],index=['Missing Value','Missing Value %'])return miss_dfcheck_miss(df1)

As we have a very less percentage of missing values then we can directly drop those rows for that I have created a copy of the data frame and have used dropna command which will drop rows with missing values.

df2 = df1.copy()df2.dropna(inplace=True)

There are two ways to create a ticket. Either user can call the L1 team and they can raise a ticket on behalf of user or user himself/herself can raise a ticket. When the L1 team raises a ticket they follow a format that contains a lot of unnecessary text description which I have removed using regex.

df3 = df2.copy()df3['Description'] = df2['Description'].apply(lambda x: x[:x.find('Issue Status:')])

Now we can start our EDA process.

4. EDA

We will be doing EDA for three columns i.e. Target column, Short Description, and Description. I have used countplot to see the number of assignment groups and wordcloud to see the most occurred words.

4.1 Target variable

plt.figure(figsize=(10,20))
sns.countplot(y=df3[‘Assignment group’],order=df3[‘Assignment group’].value_counts().index)
plt.show()

There are many assignment groups whose data is not sufficient. I have removed groups whose value count is less than 200.

I have done this in two-step:

  1. Create a list of assignment group whose value count is less than 200.
  2. Delete rows that are not there in the assignment group list.
assignment_group=[]
for key,value in df3[‘Assignment group’].value_counts().iteritems():
if value>=200:
assignment_group.append(key)
df4 = df3[df3['Assignment group'].isin(assignment_group)]

4.2 Short Description

I have split this into 2 parts.

  1. Length of short description for each assignment group.
  2. The most word used in a short description for each assignment group.
df5 = df4.copy()
df5[‘short_count’] = df4[‘Short description’].apply(lambda x: len(x))
plt.figure(figsize=(10,8))
sns.barplot(x=df5['short_count'],y=df5['Assignment group'])
plt.show()

Wordcloud

text = df5.groupby(‘Assignment group’)[‘Short description’].apply(lambda x: “ “.join(x))index = 0
plt.figure(figsize=(20,30))
for key,value in text.iteritems():
# Create and generate a word cloud image:
wordcloud = WordCloud(stopwords=stopwords).generate(str(value))
index+=1
plt.subplot(12,3,index)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title(key)
plt.tight_layout()

I have done same analysis in the Description column also. Please check GitHub for the full code.

I observed that for each assignment group there is some common word used in there short description and description.

For example in EC&C — ProjectWise ProjectWise is the common word used.

5. Model Creation

I have created a separate file for model creation and prediction.

Again we need to import pandas and read our data.

This time I have kept only two columns i.e. description and Assignment Group.

import pandas as pd # Read data
df = pd.read_csv(‘incident.csv’, encoding=’ISO-8859–1')

# Dropping rows with NaN target value
df.dropna(subset=[‘Assignment group’, ‘Description’], inplace=True)

I have assigned the Description column which is our input variable to X and the Assignment group which is our target variable to y.

# NLP Modeling
X = df['Description']
y = df['Assignment group']

In the next step, I have imported LabelEncoder and converted the Target column from text data to numeric one. For Example, if we have 100 unique target variables then for each variable there will be a unique number assigned.

# Encoding target variable
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder().fit(y)
y = pd.Series(le.transform(y))

Now I am importing train_test_split which I have used to split our data into training and testing.

# Splitting Data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

I am using CountVectorizer which will tokenize text in each description, will remove stop words, will remove words whose count is less than 5 in total, and will keep the frequency of each word.

# Vectorisation of description
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(stop_words=’english’, min_df=5).fit(X_train)
vector = cv.transform(X_train)

For NLP two algorithms give good accuracy one is Logistic Regression and another one is Naïve Bayes. I have tested both and for this project, I was getting more accuracy in logistic regression so I have used this algorithm.

# Logistic regression
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(vector, y_train)

Once we trained our model now we need to save every preprocessing step in form of a file that can be used later. For that purpose, I am using pickle.

# Pickleing model
import pickle
file = open(‘cv.pkl’, ‘wb’)
pickle.dump(cv, file)
file.close()
file2 = open(‘clf.pkl’, ‘wb’)
pickle.dump(clf, file2)
file2.close()
file3 = open(‘le.pkl’, ‘wb’)
pickle.dump(le, file3)
file3.close()

Lastly, I have created a function that takes text data as input and will do all the preprocessing and predict the assignment group as output.

# Predictiondef 
predict(description):
prediction = clf.predict(cv.transform([description]))
return le.inverse_transform(prediction)[0]

6. Deployment

Once we have our model ready we need to deploy it.

I am deploying my model in Heroku, you guys can use any other platform.

I have streamlit library for my frontend UI. I have tried flask also but for this project, streamlit is a better choice than flask as it is very easy.

Once we run this app it will show UI in the web browser where we can enter ticket description and click on Predict button.

--

--

Anurodh Mohapatra
Analytics Vidhya

A data scientist, an AI enthusiast, and a guy slightly obsessed for code quality. LinkedIn: https://www.linkedin.com/in/anurodhmohapatra/