How Urban Company spots Customer Appreciation for Partners using NLP

By — Mrinal Jain (Intern, Urban Company), Karan Kapoor (Engineer, Supply Team)

Published in

Urban Company – Engineering

5 min readJul 7, 2022

At UC, one of our most cherished values is Win-Win. It drives us to create a meaningful impact on the life of our 40k partners: our service professionals. While we are working on a plethora of products to give them financial stability and inclusion, we have also taken a stab at building a Digital Community called “Cult” for our partners, a space where our partners can share their experiences, interact with each other and are recognised for their good work by UC and by our customers. More on Cult later.

While building Cult, we realised that nothing motivates our partners more than genuine appreciation from customers. Our customers often use social media platforms, like Twitter and Google reviews, to call out the good work that our partners do. But there is a catch. Twitter and Google are also used by a few customers who might want to reach out to us for help or share feedback. So how do we separate out the compliments and appreciations from feedback and help messages? Given the scale at which UC works, manual scavenging was not an option.

Sentiment Classification

At first, we thought that it was a trivial problem, nothing more than a simple application of sentiment classification. We started with HuggingFace, an open-source platform for models and datasets for NLP tasks. Most importantly, it already had a pipeline function for sentiment classification, using the BERT model created by Google. Was the problem this simple? No, this was just the tip of the iceberg.

Not all Positive Tweets are Created Equal: Enter sarcasm and promotions

A lot of positive tweets weren’t customer feedback tweets at all. Rather, they were announcements and promotions for UC. We now had to find a way to differentiate customer appreciation from other positive tweets. We formulated and discarded multiple solutions, and finally zeroed in on the Mini-LM model. We used it to compare how similar a tweet was to a sample positive tweet.

The next problem that emerged was that of sarcastic sentences. We tried raising the threshold score to block sarcasm, but it was impossible to tune it to allow genuine feedback as well. Also, this 2-step algorithm of sentiment classification followed by sentence similarity was slow, and heavily dependent on external models.

No More External Dependencies

We finally realised that the root problem with our approach was our sample appreciation tweet that we used above with the Mini-LM model. Just small tweaks to it caused drastic variability in the results. So, we decided to train our own model using a dataset of about 36,000 tweets tagging UC on twitter.

Generating training data for the model

The challenge now was to label the dataset for training. We took inspiration from the world of unsupervised machine learning, or more specifically, k-means clustering. To reduce our dependence on the single sample sentence, we implemented a repeated sampling algorithm.

Repeated Sampling Algorithm Flow Diagram

Starting out with a set of two initial ‘reference’ sentences, our repeated sampling algorithm calculates an average similarity score for each sentence. It uses this to randomly pick top sentences to be the next reference sentences. This process is repeated until convergence. This algorithm classifies positive sentences as customer appreciation or not with about 98–99% accuracy. We also confirmed that no matter what initial appreciation sentences are used, the algorithm converges to approximately the same results. However, it only works in bulk, i.e., with at least 500–1000 sentences. So, we used sentiment classification and repeated sampling to label our large dataset (over 30k tweets). Now the only task that remained was training a model.

Training Time!

Rather than starting from scratch, we chose to use transfer learning on the BERT sentiment classification model. Here’s why:

We were taking advantage of the model’s existing ‘understanding’ of language.
Sentiment classification is a similar task to identifying customer appreciation tweets.
Training a model from scratch would require extremely long training time, powerful GPUs/TPUs, and a massive dataset.

After fine-tuning the BERT model on our labelled dataset, we attained a test-set accuracy of about 98.5%.

Summary

About the Authors:

Mrinal is a high-schooler who decided homework is too boring and realised his passion lies in coding and machine learning. Looking for tougher challenges, he ended up partnering with Karan from UC and started exploring real world applications of Data Science at UC. When he is not geeking out, you will find him trekking the Himalayas or the Nilgiris.

Karan Kapoor is part of the Supply vertical at Urban Company, working on multiple partner side initiatives. His other interests are gyming and travelling the world.

Sounds like fun?
If you enjoyed this blog post, please clap 👏(as many times as you like) and follow us (@UC Blogger). Help us build a community by sharing on your favourite social networks (Twitter, LinkedIn, Facebook, etc).

You can read up more about us on our publications —
https://medium.com/uc-design
https://medium.com/uc-engineering
https://medium.com/uc-culture

If you are interested in finding out about opportunities, visit us at http://careers.urbancompany.com