Responsio: Automating CS Repetitive Tasks

Published in

Bukalapak Data

9 min readJul 29, 2020

How we use machine learning to help CS Agents work more efficiently

*BukaBantuan — Providing the Best Experience*

At Bukalapak, serving our customers with the best possible experience is one of our main goals. One crucial aspect of user experience is customer service. With respect to this, we — through our Customer Satisfaction Management (CSM) team — are in a constant effort to deliver an A+ class customer service experience by implementing omnichannel customer support that comprises BukaBantuan, hotline centre, and social media.

As one of the biggest e-commerce sites in Indonesia, we have a vast number of customers. On customer support alone, there are around 300k customer service tickets submitted through our omnichannel in a typical month. They are ranging from questions about package delivery progress to asking for explanations of a certain feature of our platform. Each of these numerous tickets should be resolved by our team appropriately, also more importantly, in a timely manner.

The latter desiderata requires us to treat this customer support matter at scale. This is also true in the first place since we have limited resources for agents — and agent time, therefore we have to be smart. To this end, we leverage machine learning to help us automate some of repetitive tasks of our agents. By saying repetitive tasks we refer to frequent queries that happen to have standard answers to look up. This automation would allow us to be more efficient, since our CS agents can dedicate their time to tackle more complex queries and let the automation take care of the rest.

To better comprehend the matter, let us read a short story about Homer:

Homer is one of our CS experts, he has been dealing with various customers’ queries for more than three years. At the beginning of the second shift of work (recall that our CS is open 24/7), the first query ticket just arrived for him. He needs to resolve it ASAP. The ticket contains the following question: “How to be an investor at BukaReksa?”

The expert-Homer put a grin face for a moment, noticing the query is an easy one. He knows precisely what to answer the question and starts jotting the answer down to resolve the ticket.

Many other tickets were already waiting for resolution while Homer was resolving the first one. On the other hand, the first query still needs some of Homer’s precious time/workload to accomplish despite being standard. In light of this, it would be so much better if there is a machine which can handle such standard queries. Homer therefore can focus his time to tackle more complex problems faced by our customers.

So we built one!

How does it affect our team?

In the case of Homer’s first ticket, we can immediately reply to an answer from our customer. With a response time of less than 1 second. Lots of standard operating procedures are a repetitive task for agents. It’s quite an adventurous journey to dive into existing SOPs and uncover the potential automation that can be applied. We are developing a framework to reduce the expensive costs that can come out every month.

In some of the contact reasons we handled earlier, we managed to convert a ticket that should have been handled by an agent to be handled by this automation of up to 2.5% in only one category of contact reason. This will greatly accumulate going forward with more categories. Not only ticket resolution time that could be reduced, but also the agent performance could be more utilized.

Let’s learn more about how we develop Responsio in the following sections!

Automation is the Bottom Line.

The ultimate goal was to have a platform that helps our CS agents don’t need to do repetitive tasks which either could be resolved by the customer themselves, or have a standard answer. It is a service that could automatically predict customers’ intention on their complaints and automatically reply to the customers. The benefits of owning such a platform would be speed improvement (better service level agreement (SLA)) while maintaining accuracy. Therefore, resulting in improved overall customer experience.

Architecture

The architecture of Responsio is directly connected to the complaint channel in Bukalapak through BukaBantuan. When a customer has a problem and submits it directly from BukaBantuan, the complaint will be directed to Core Responsio. It performs several stages of a simple text classification algorithm in determining the specific intention of a customer’s problem. The process running under the hood is quite straightforward. It includes text cleaning, feature extraction and training machine learning models.

The model will predict the text submitted by the customer. If the results of the prediction match and/or the status of the transaction go with the existing standard operational procedures, the system will respond directly to the requestor with a simple decision tree algorithm.Once deployed, we always conduct A/B testing experiments on any improvement made on the model to achieve the best significant results both in terms of feature performance and customer satisfaction.

How we represent the text as model feature

As you might have guessed, Responsio needs the correct prediction of each sentence to guess the user’s intentions and give the appropriate response too. Before the prediction is made, to be understood by the machine, a word needs to be represented in a vector, which can also be called the Word Embedding. Not only a word, it can also represent a sentence, a document, or a relationship between words.

We can represent each word in a sentence by using one-hot encoding. It is a simple method to convert categorical variables (words) into a binary vector, commonly used to quantify the categorical data. In short, one-hot encoding performs binarization on each word and makes a feature for training a model. One parameter associated with this method is the vocabulary size. For example, suppose we consider the following set of words as our vocabulary:

Vocabulary = {we, will, solve, your, problem}

We can then encode each of its members as follows. Note that the cardinality (set size) of Vocabulary is five, therefore each of the words below is represented by a vector with dimension of five. Moreover, only one unique position is filled by 1, else are 0s.

We = [1,0,0,0,0]
will = [0,1,0,0,0]
solve = [0,0,1,0,0]
your = [0,0,0,1,0]
Problem = [0,0,0,0,1]

By this way, it has successfully converted into a feature vector. However, one hot encoding still cannot represent words that are similar and in the same meaning. To handle this, we use word2vec on top of the one-encoded words to understand the context of these words. For example, in the figure below, word2vec can understand the relationship of words between king and queen (king + female = queen).

With respect to training word2wec, there are two approaches we can use. They are Skip-gram and Continuous Bag of Words (CBOW). In Skip-gram, the target word is used as input, while the words in between are used as output. All inputs and outputs have the same number of dimensions and can be generated with one-hot encoding vectors. There is only one hidden layer used in this architecture, with the dimension size as large as the embedding size. The output layer can be used the softmax activation function to be applied to turn logits into probabilities. Afterwards, the network will use backpropagation to learn.

In CBOW, the architecture got flipped. By using the context of each word as input and the target word as output. Similar to skip-gram, the feature vector of words will be represented by one-hot encoding. The only difference is the hidden layer uses a weighted sum of inputs. And the output layer will use a softmax activation function too.

*Training Algorithm of CBOW and Skip-gram [1]*

From the above picture, we can see that the inverted direction of the prediction is used in skip-gram. It will result in learning a finer-grained vector when it is fed with more data. To overcome the case of infrequent words that appear, the skip-gram algorithm is used to train response models to produce better predictions.

At this point, we have Word2vec which has successfully converted words to vectors and can represent the similarity of one word to another. However, what if a test set appears a word that is not in the training set? This will become a new problem, especially if we have very limited training data. For this reason, we will make use of Facebook AI Research (FAIR)’s created method FastText to deal with this. The idea of this method is to break a word into sub-words based on a certain number of n-grams.

For example, if we have the word `Bantuan`, the trigram of this word is ban, ant, ntu, old, and uan. Instead of only using a whole word that will be used as input, in this way, all sub-words will be input tokens and will be passed to the architecture to generate feature vectors. As a result, we will have the word embedding of each n-gram/subwords from our training dataset and can easily match the tests if the subwords are the same. This method that is commonly called FastText will be able to represent rare words [2][3].

In Responsio, we use Gensim to train the FastText. Gensim is a famous robust and efficient library to model a semantic form of a text by unsupervised learning.

Predict the intention

After getting the feature vector from the text (using one-hot-encoding, word2vec, and FastText), the prediction process can be done by feeding new words (CS inquiry) into the trained model. In the initial stage, the responsio architecture uses the XGBoost algorithm to do text classification, the class used in this case is the intention of each user complaint, such as a refund request, or a question about the status of payment, etc. XGBoost is the development of a tree-based model algorithm that is an ensemble machine learning algorithm. It is an improved gradient boosting model that optimizes parallelization, tree pruning, hardware optimization, regularization, sparsity awareness, block structure and continued training [4]. Subsequent to predict the intention, responsio will check the state of each problematic transaction. If the transaction status meets the existing rules, Responsio will directly reply the ticket complaint with the answer. It will only answer complaints if and only if the state meets the rules and no agent involvement is needed. If the customer still asks the same thing for the second time, the CS agent will immediately intervene in solving the problem.

In closing

That’s how Responsio is designed and works. The platform will continue to be developed continuously to meet the business needs of the customer support team. This will also be integrated into products and other interesting pipelines to reduce the workload of the CS team.

References

[1] Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).

[2] Bojanowski, Piotr, et al. “Enriching word vectors with subword information.” Transactions of the Association for Computational Linguistics 5 (2017): 135–146.

[3] Joulin, Armand, et al. “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759 (2016).

[4] Chen, Tianqi, and Carlos Guestrin. “Xgboost: A scalable tree boosting system.” Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.

Credits

My special thanks and appreciation to Pararawendy Indarjo, mas Hendra Hadhil Choiri, and mas Jonathan Kurniawan who help to proofread this script. You guys rock!