Harnessing the Potential of NLP: Effortless Experience & Efficient Customer Service

Published in

Atlantbh Engineering

8 min readApr 9, 2024

Natural Language Processing (NLP), one of the branches of AI, encompasses a powerful set of techniques with a wide range of applications, especially in the field of customer support. By enabling machines to understand, interpret, and generate human language, NLP breaks down the barriers between technology and human interaction, paving the way for a more intuitive user experience and personalized customer service.

Whether you are an experienced organization or a budding entrepreneur, it is important that you understand how NLP can be used to improve your business. Join us as we explore NLP by practically applying it to one common task. In addition, discover the success story of the collaboration of two Atlantbh teams, the development team and a small group of people from the data team, which led to the construction of a customized text classification model and its smooth deployment in production.

Using NLP in Customer Service

NLP tasks play an important role in automating and optimizing various aspects of customer service, improving the overall user experience and operational efficiency. More and more businesses are developing conversational agents (chatbots) that can interact with customers in natural language, respond to frequently asked questions, or solve basic inquiries. Text summarization techniques can be applied to automatically generate concise summaries in cases of rather long customer inquiries. Speech recognition techniques enable transcribing phone calls or voicemails into text. Real-time translation services provide multilingual customer support and enable smooth communication with customers who speak different languages.

There are numerous examples of using different NLP tasks in customer support, but text classification is certainly one of the most common. Problems such as spam filtering, sentiment analysis, ticket triaging, or ticket classification can be considered as variations of the same task: text classification.

Figure 1. Examples of Text Classification in Customer Service (Anja Plakalovic)

Problem Definition: Customer Support Ticket Classification

For any business, customer complaints are important, as they can often indicate shortcomings in their products or services. If these complaints are not resolved quickly, it can lead to customer dissatisfaction, while a recurring trend of dissatisfaction can lead to reduced revenue.

Therefore, solving the problem of automatic classification of customer support tickets has a significant value, both for businesses and for customers. From a business perspective, it leads to a streamlined customer support workflow and improved operational efficiency. On the other hand, from a customer perspective, it ensures that their inquiries are immediately routed to the appropriate team, resulting in faster responses and a smoother user experience.

Atlantbh had the opportunity to work on solving the problem of classification of customer support tickets for an international company. We can formally define this problem as follows:

Given a set of customer support tickets (i.e., formal records of customer inquiries, issues, or complaints) and a predefined set of categories (e.g., shipping issues, refund issues, account settings issues), the task is to construct a model that can accurately classify each ticket into the appropriate category, depending on the ticket content.

Business Goal

The existing process for handling customer support tickets relied entirely on manual categorization, where customers were tasked with selecting a specific category before submitting a problem. However, this approach has proven to be error-prone, with frequent occurrences of misclassification. Consequently, customer inquiries were often routed to the wrong support teams, leading to delays in resolution and customer frustration. In addition, support teams spent significant time recategorizing misclassified tickets and routing them to the appropriate teams. This manual reassignment process increased response times and resulted in redundant efforts and inefficiencies across various support teams.

Figure 2. Customer Support Ticket Handling Process: Before vs. After (Anja Plakalovic)

The proposed support ticket handling process intends to automate ticket categorization using a custom text classification model. This solution eliminates the need for customers to manually select ticket categories by using NLP and ML techniques to automatically classify tickets based on their content. This also simplifies the user interface for reporting problems, allowing users to report problems more quickly without having to decide which category the problem belongs to. (Figure 3.)

By implementing this proposed approach, the goal is to achieve a considerable improvement in the accuracy and efficiency of support ticket classification, surpassing the estimated accuracy of the existing solution, which is approximately 80%. This way, we aim to reduce the time teams spend recategorizing misclassified tickets and improve overall customer satisfaction by providing faster solutions to their inquiries.

Figure 3. Report a Problem Form: Before vs. After (Anja Plakalovic)

Approach

The client first approached the development team with a request to create a solution for automatically classifying support tickets. After defining the problem, the development team extracted data from the ticketing system. This way, a labeled dataset was collected, containing the text content of existing tickets (i.e., customer inquiries) and each ticket’s corresponding category or label.

The development team tried to solve the problem by applying a rule-based approach. This method is often the initial and simplest strategy for solving text classification problems. A rule-based approach uses predefined rules to classify support tickets based on specific criteria or linguistic patterns. However, despite the team’s efforts, it quickly became apparent that the rule-based approach alone could not adequately address the complexity and variability of ticket content. As a result, the team recognized the need to explore more advanced methodologies, which led to collaboration with the data team and using ML approach to solve this problem.

The figure below shows an overview of creating a text classifier using a supervised ML approach, delineating the teams involved and their respective responsibilities. Considering that we already clarified the first two steps, we will henceforth describe the remaining steps included in the used approach.

Figure 4. Overview of Text Classification Flow Using ML Approach (Anja Plakalovic)

The data team’s first step after being introduced to the problem by the development team was to familiarize themselves with the data and develop a sense of what could be derived from it. This step is formally called Exploratory Data Analysis (EDA). It enabled us to raise awareness of potential challenges and constraints early in the project lifecycle, guiding the appropriate further approaches and risk mitigation. In the context of our customer support ticket classification problem, EDA provided valuable insights regarding customer inquiries. These, among others, include the distribution of ticket categories, the frequency of specific keywords or phrases within each category, and the customer inquiry length distribution.

A thorough EDA created a good foundation for defining steps in the feature engineering phase, which ensured converting the raw text data into a format suitable for the ML classification algorithm. Some of the performed feature engineering steps include data cleaning, standard text preprocessing techniques such as stop word removal, lemmatization or tokenization, and text vectorization at the end. Making the strategic decision to use Word2Vec as a text vectorization algorithm ensured we captured semantic relationships and context within the text. Rather than relying on pre-trained models, training Word2Vec on our dataset provided word embeddings tailored to our use case. Before performing text vectorization, we split the dataset into training and test datasets and trained Word2Vec only on training data. Afterward, we saved the trained Word2Vec model to preserve the learned word embeddings and use it to vectorize training and test data.

Upon completing the transformation of our raw customer inquiries into a suitable format as input to the ML algorithm, the next step entailed selecting a suitable classification algorithm. Extensive research encompassing a spectrum of classification algorithms, complemented by our profound domain knowledge acquired through the EDA phase and an exhaustive feature engineering endeavor, played a crucial role in this decision-making process. The culmination of these efforts led us to discern the Support Vector Machine (SVM) as the optimal choice for our text classification task. This decision was reinforced by SVM’s well-documented prowess in handling high-dimensional and sparse data since these characteristics are common in text classification tasks. Its ability to delineate complex decision boundaries, robustness in handling non-linear relationships, and good generalization performance further solidified its importance in our use case. By implementing hyperparameter tuning, we ensured using the optimal SVM parameters that improve classification performance. After training the model with these parameters on the training dataset, we saved the SVM model. Then, we used it on the test dataset to evaluate the performance of the created model.

It is interesting to consider the distribution of time that the Atlantbh data team invested in implementing the previously mentioned phases. It is important to note that the following results represent rough estimates and depend significantly on the nature of the problem and the specific use case. In our scenario, we invested approximately 10% of the total time in EDA. This effort mainly included a thorough familiarization with the dataset at the beginning of the project. On the other hand, the feature engineering phase took the majority share with 60%, showing significant time investment in the data preprocessing before the model training. Model construction, which included ML models’ development and refinement, accounted for the remaining 30%.

Figure 5. Time Allocation Overview: EDA, Feature Engineering, and Model Construction (Anja Plakalovic)

Results

By implementing the proposed approach, our primary goal was to significantly improve the accuracy and operational efficiency of the customer support ticket classification system. We exceeded the estimated accuracy of the existing solution, which was around 80%. More precisely, our approach resulted in an outstanding performance, achieving 93% accuracy when evaluated on the test dataset. After presenting the results to our clients, diligent work and seamless cooperation of different Atlantbh teams enabled the smooth deployment of the proposed model to production. A few months after the model deployment, we evaluated its accuracy in production. It turned out that in production, the proposed model for customer support ticket classification has an accuracy of an impressive 98%, which significantly exceeds even its initially estimated accuracy.

This significant improvement not only confirms the effectiveness of our approach but also highlights the commitment and expertise of everyone involved in the process. By surpassing the previous performance, we effectively reduced the time spent recategorizing misclassified tickets, streamlined operational flows, and increased team productivity. Moreover, this improved accuracy further leads to faster and more accurate responses to customer inquiries, which increases customer satisfaction. This success underlines the significant benefits and improvements businesses can achieve using NLP and ML techniques, and Atlantbh’s commitment to delivering exceptional results and creating tailored solutions.

If you found this blog engaging, we encourage you to read part two: “Comprehensive Guide: Creating an ML-Based Text Classification Model”. As the name suggests, this blog provides an in-depth description of individual steps of the proposed approach.

Originally published at https://www.atlantbh.com on April 9, 2024.

Blog by Anja Plakalović, Data Analyst at Atlantbh.