--

Predicting IT Ticket Durations Using Support Chat Messages

Our partner company: Electric AI

Founded in 2016, Electric AI provides IT support as a shared service to small and medium-sized businesses, primarily via chat support over Slack and Teams. Electric actively supports over 50k users and fields more than 44k IT tickets across their client companies monthly.

With that volume, it should be no surprise that providing an unparalleled customer experience is at the forefront of Electric’s mission. This is where our Columbia Analytics in Action team, composed of MBA and Engineering Masters students, came in. We were tasked with helping Electric predict how long it would take to resolve users’ IT support tickets, which they could then potentially incorporate into their product. By providing an estimate of when their users could return to business as usual, Electric can better manage expectations and achieve greater overall user satisfaction.

Mockup of a resolution prediction included in an active support chat

Building our Understanding

Before framing the problem, we wanted to better understand the company, its users, and its processes. We figured the best way to gain an understanding of Electric’s business was to dive into the data. Electric has a process to scrub all personally identifiable information including company and employee names, locations, email addresses and phone numbers from the usage data it generates before the data is ever analyzed (even internally). Electric provided us with these de-identified datasets to start with. The dataset consisted of the following tables:

Issue Data: List of IT tickets with open/close datetime stamps, hashed customer/employee IDs, request type and category.

Conversation Data: Redacted conversation logs exchanged between end users and Electric support agents (scrubbed of PII).

Customer Data: A hashed list of customer IDs (no names) along with their activation date, chat platform, and time zone.

We focused our first few meetings with Electric on understanding what each field meant, where it came from, and how it was generated. We also made a point to put ourselves in the shoes of the end user, and stepped through some representative ticket chat logs to recreate their journeys.

Example User Journey — looks straightforward enough, but this ticket took 65 minutes to close!

By taking the time to perform this exercise, we gained an appreciation for Electric’s users, and could certainly see how a time-to-close estimate may provide some reassurance that their ticket is being actively worked on and that they could get back to work quickly.

Next, we performed some descriptive analysis to see if we could draw any insights or patterns from the data. From our initial review of the data dictionary, we felt that ticket categories would logically be one of, if not the most, useful feature. Categories were predefined by Electric, and could be selected from a dropdown list by the user when opening a ticket. However upon further analysis, we found that there were only small differences in ticket durations between categories.

Tickets across all categories have an average duration of about 90 minutes.

As we reviewed the data further, particularly the conversation data, we saw that there were IDs for the agents responding to the tickets, but no other data on them. Our hypothesis was that individual agent performance might play a significant role in the ticket resolution times, or at the very least, there may be a difference between onshore and offshore agents. Upon request, Electric was able to provide us with a reference table consisting of agent location and level, along with another dataset of ticket close categories that agents assign at the end of a support conversation.

To understand the impact of individual performance, we looked at agents’ average ticket durations. We saw that the agent handling the ticket appeared to be a strong indicator of resolution time and included this in our final model. However, we recognize that some agents may focus or specialize in certain types of issues so there is a risk in incorporating this feature. Given more time, we would have looked for additional opportunities to incorporate better features or create feature combinations more suited for our model.

Notable variation of ticket durations among agents

With the agent data, we performed another analysis to understand the impact of the number of active agents working at a given time on ticket duration. We found that ticket duration and number of active agents were correlated but we also found that the different clusters in the data were largely due to a decrease in weekend staffing. We therefore decided to exclude weekend tickets from our model.

High variance among cluster 1’s weekend staffing, which we excluded

Tackling the Problem

Having reinforced our understanding of Electric’s context and problem statement, we turned our attention towards building a predictive model. We knew our project was unique in that our primary inputs were free text fields, so after creating our initial baseline linear regression model, we investigated advanced text analytics techniques.

We began by applying Latent Dirichlet Allocation (LDA) on the issue descriptions, but there was only marginal improvement from our baseline. This was likely because LDA is not as effective with short-form text; the issue descriptions were often one to two sentences at most. After consulting with the teaching team [and Google searches], we came across Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM) which is more applicable for the smaller sets of text Electric dealt with. With this method, we were able to develop new differentiated topics to use in our models.

Initial topics generated using GSDMM

We came across another key issue when determining what cutoff point to set in the conversation to make the prediction. This was important because we would be using all of the chats up to that point as inputs to our model. To solve this, we leveraged Term Frequency–Inverse Document Frequency (TF-IDF). We noticed that there were outliers in the distribution of ticket duration and ultimately excluded tickets that were resolved in less than 15 minutes or more than 24 hours. As a result, it made sense to use the first 15 minutes of a conversation for our analysis.

We identified the most important words based on how often they appeared in the conversation and a given collection of conversations. We found that some words represent signal to ticket duration — for instance, words related to ‘password’ show up in shorter-duration tickets (negative coefficient) while words related to “new macbook” which require a lot of approvals show up in multi-hour or multi-day tickets (positive coefficient).

Log Model R2 with First 15 Min. of Conversation Data (TF-IDF)

Finally, we wanted to utilize the categories that agents assign upon closing support conversations without actually using them as a feature for our prediction (this would have been after-the-fact). With this in mind, we utilized a Bag-of-Words technique by looking at the initial ticket descriptions segmented by final closing categories. We were able to formulate the below word clouds that would allow us to predict the closing categories of each ticket and apply that to our model.

Word clouds generated from bag of words approach

Pivoting to Time Window Estimate with Classification

All of the above models generated valuable insights but never surpassed 0.34 R2, meaning the variables in our model could not sufficiently explain the variance in predicting the exact time to ticket resolution. We considered whether an exact time would actually be useful to the end user, determined that a window would be more realistic, and pivoted to a classification model in order to predict a time window that a ticket would close by.

We had attempted this earlier but our resulting model only returned an accuracy of 0.61, meaning it could only accurately predict the time windows for 61% of tickets in our test sample. We found the poor result was due to our data being unbalanced, with two of the three time windows having low recall, i.e. a high number of false negatives. This time around, we applied SMOTE to balance the data and used gradient boosting instead of random forest. Our final classification model returned an accuracy of 0.71, which met the milestone we had internally set for ourselves at the onset of the project. We were thrilled at the result!

Our final classification model

Where Electric Can Go from Here

Moving forward, we’ve delivered two models to Electric to draw insights and analysis from as a basis to improve their service and create a better overall customer experience. From a practical standpoint, our classification model would be most applicable for Electric to make a prediction with 71% accuracy. With tighter zones and additional fine-tuning to this model, we anticipate it may be suitable for general availability, though Electric would have to consider how to treat especially long estimates.

This is not to say that the regression model does not provide any valuable insight into improving the service. Indeed, the analysis provided by that model has utility because it could be leveraged to improve operations and resolution times. We were able to propose new IT ticket categories that may be more accurate groupings and representations going forward, and those could be taken into consideration if and when Electric decides to revisit them. Improved categorization may lead to more efficient treatment of tickets and therefore better resolution time and more satisfied users.

Additionally, our regression model provided insight into the most important factors of a ticket that impact the resolution time. In our case, we observed that Agent ID has the most significant influence. This is valuable information for Electric as they now have analytical evidence into what has the greatest impact on resolution time for a ticket and can take a more targeted approach to any future optimization.

Takeaways from our Semester Together

We took away many valuable lessons from our semester, and want to share two of them here. Firstly, trial and error is OK! The problems we were trying to solve were difficult ones, and oftentimes there was not a clear path forward. There is no shame in not knowing what to do next and when in doubt, throw a bunch of models against the wall and see what actually sticks! Sometimes the best path forward is just making progress.

Secondly, don’t just shoot for the moon. As ambitious Columbia students we may be especially eager to deliver polished outputs that will be immediately impactful. But in reality, that is not always the case. With an ambiguous problem like ours, it took a lot of baby steps and even then, our deliverables would still require many more future iterations before being production-ready. We were reassured that we at least generated valuable insights along the way via our detours.

We hope our work will be of some value to Electric. We genuinely appreciate the opportunity to work with such an inspiring company.

--

--