AIXPERIMENTATIONLAB — How To Classify Customer E-mails With Artificial Intelligence

A Markus
Organizational Development @ WZL
6 min readJan 31, 2023

How can we identify the underlying problem in a customer enquiry, using techniques from natural language processing (NLP) and machine learning?

Presentation of the use cases and companies

For more than 25 years, GRÜN aixtema GmbH has been providing after-sales support and service for manufacturers in the IT, communications and electronics industries. The daily business is characterized by a large volume of incoming customer enquiries. Fast response times are key for an efficient day-to-day operation, and the number of processed orders is determinant for customer satisfaction. Due to a growing customer base and product portfolio, the content of customer enquiries increases in variability over time, which poses a major challenge for steady response and processing times. Therefore, employees are faced with more complex tasks. This is a major challenge, especially for new team members who are still in the onboarding phase.

To address this problem, the internal, well-structured knowledge management system of the company is leveraged with text classification algorithms. In this way, searching and problem-solving times are significantly reduced, and customer support enquiries can be answered more quickly and in a more targeted manner, resulting in greater efficiency. Moreover, the matching system works as a guidance for accelerating onboarding of new employees.

The data basis consists of previously answered customer mails and the associated problem. How can this database in combination with natural language processing be used to improve the employees’ decision making?

The central task of the decision-support system is to propose the most fitting problem-solution classes within the database for an incoming mail. We achieve this, by building an end-to-end pipeline with the text from the incoming mail as input and an interpretable output for the users.

Figure 1: Illustration of the end-to-end pipeline

First, the incoming e-mail text is cleansed, and the format is unified. Then, the text is split up into separate parts (tokens) and reformatted into a mathematical vector, to obtain a numerical input that can be fit into a classification algorithm. This machine-learning algorithm is trained via supervised learning and can later be used to match the e-mail to the different classes. Afterwards, we transform the numerical output of the algorithm into a result that is easy to understand for the users.

Text Cleansing

As a first step, the text data in the database is cleaned. We follow the standard steps of text cleansing:

  1. Removal of empty E-Mails

2. Removal of special characters, UTF-encodings and punctuation, to prevent compatibility issues with the following steps

3. Lowercasing to standardize the text

4. Reducing words to their grammatical base form (lemmatisation) to simplify the text and allow for an easier feature extraction

5. Extraction of the last e-mail from the conversation and removal of greetings and farewells

Figure 2: Text cleansing improves the text quality. we remove unwanted characters, capital letters and reduce words to their grammatical base form

Vectorization

Figure 3: vectorization represents the text as a vector of words

Next, we represent the text as a vector of words. Therefore, we split the text on every space to obtain the individual words. These word vectors vary in length between just a few entries, up to more than a thousand. Since the model requires a unified input, the word vectors need to be padded for shorter mails. There we add just empty entries (pads) at the end of the mail, to artificially lengthen the word vector of the e-mail, so the model gets a unified input. E-mails longer than the chosen length need to be cut after this threshold.

Tokenization

Figure 4: Tokenization replaces words with numerical tokens

To provide numerical data for the classification model, we use Byte-Pair Encoding. There we combine frequently used letter combinations into single tokens, until we reach the total number of chosen tokens (in our case 50000). These tokens are assigned a numerical value, starting by the single letters and ending with more complex letter combinations up to simple words. Then we replace the words by their respective tokens.

Classification Model Tokenization

After the pre-processing of the text data is finished, we are ready to train the classification model. To get the essential knowledge from our dataset into the model, we use the concept of supervised learning. We insert a large portion of the mails (90%) and their respective labels into the model and let it adapt, until it can classify the mails in this training dataset. Afterwards, we use the remaining 10% of the dataset — the validation dataset — to evaluate the model on “unseen” e-mails. This step is necessary, since modern, complex models like neural networks have many internal parameters and can “memorize” large portions of the dataset. This problem is called overfitting. Since we want the model to classify new mails, we need it to generalize well, which we ensure by validating it on unseen data.

After the training is done, the model can be used inside the application. To process a new incoming mail, the previous steps of text cleansing, vectorization and tokenization need to be performed in the same way, as for the mails from the database. Afterwards the model is used to classify the mails. As output, it generates a probability density for all the categories. Since we assume each mail to address only one problem, the probability density is normalised, such that the sum over all categories equals one. To give a clear output to the users, we need to interpret this probability density.

Evaluation

The simplest way to interpret the probability density is to choose the class with the highest probability as the result. This is the way to go for ideal datasets with distinct classes. However, a customer may describe more than one question in the mail, or the result can be ambiguous. By only choosing the class with the highest probability, we drop a lot of information that can be used otherwise.

Figure 5 we evaluate the numerical machine output to represent the result in an easier manner

Since our system is designed to support the decision process of the employees rather than fully taking over the decision, we can give the employees multiple options to choose from. This can be done in many ways. One simple way is to give the three classes with the highest probabilities as results. To indicate, how certain the result is, we combine this output with an indication for the probability. As mentioned earlier, raw numbers are often misleading and hard to interpret, especially under time pressure. Therefore, we chose symbols like coloured arrows to illustrate the quality of the given options. Learn more about this topic in our next blogpost, which will focus on the participatory design process.

Stay tuned for upcoming project updates!

[Webpage AIXPERIMENTATIONLAB]

--

--