User Generated Reviews — classification and tagging — from redBus
In this post, we want to provide some thoughts behind how redBus went about classifying and tagging our reviews on our customer platforms.
One of the important check a customer does while booking a bus ticket on our platform is — look at the user generated reviews (UGC). Like any other eCommerce SKU, it is quite obvious how much this is of value. Traditionally we have been collecting and showcasing in the following way.
This was good and was relevant to users. We took care of few things like ageing, removing bogus ones, manually correcting the spelling mistakes and other such things so that the final review when a user looked at was pretty optimal and helpful.
Broadly, we divided our reviews across — Staff, Punctuality and Cleanliness. Also, the moderated and grammatically correct reviews was displayed on the channel.
Why we thought we should take a fresh look at this ?
Few things were clear where we needed to improve
- Manual moderation — This was taking time as this involved actual humans to look at every individual review and take appropriate actions.
- Tags — Though we gave a score on the 3 categories, we did not do a good job in tagging them back to the actual review provided.
- Surface relevant info — Some of the other details were lost in the words unless, the user took a deeper look. We realised other then these 3, there were few more important things such as — Rest Stops, Amenities etc ..
These became our objectives / goals of the new system we wanted to build. We also got a cue from other players who are doing this and was obvious as to how to approach this from all aspects. We will talk about all these 3 in detail below.
These kind of reviews were further discarded automatically from our Machine Learning Model.
- Do not have full context (“Awesome”, “Super”, “Bad”, “not good” etc .. were discarded)
- Not related to bus journey (“Had a failed transaction”, “status of my refund” etc .. were discarded)
- Bad words.
- Some reviews were in Hindi, Tamil as well that needed to be discarded
Here we used Deep Neural Network Classifier (DNN). This article provides an in depth understanding. The confusion matrix post DNN was as follows:
We got a good accuracy of 0.72% per our model. This automatically discarded all the non relevant reviews which were earlier done manually.
On tags, we realised our data was rich and the customers were looking for different kind of cues to ensure the bus service they are booking is a good value for their money. Once we narrowed on how we need to present to the users from a UX standpoint, we went about solving this.
Prominent items in the Information Architecture:
- Review Summary and Filters.
- Classifiers — Punctuality, Experience, Staff, Amenities and Rest Stop.
- Highlighting the words that describe the tag.
We used AWS Sage Maker to host our ML models and also for A/B while improving the accuracy.
The updated flow chart is as follows.
Why we chose Random Forest to do the classification ?
- We had a good data set (in terms of training data). For RF to give good accuracy, it is important the training data is of good quality and large. It is well proven, higher the data set the better is the accuracy. Of course there are even better ones — but with the kind of problem we wanted to solve, this was more than enough. This article provides a deep dive in to RF.
- It is efficient w.r.t how to approach / handle when some data points are missing.
- Could be used for both classification as well as regression.
One important task that usually goes unnoticed while solving these kind of problems is — how to get the training data.
We approached this by creating word anagrams (single and double)
As you see from the anagrams text above, we manually picked these words and word combinations for amenities, punctuality etc ..
Post this, we took close to 150K reviews and prepared the training data as follows.
Like any other data science projects, it is important to spend some time doing the plumbing and preparing the training data and testing the accuracy. We did 4–5 iterations here to ensure our training data is of good quality.
We were able to achieve close to 85% accuracy against our 5 classifiers.
Few stats along the way.
- Initial classification based on 50K training set.
- Word Cloud of reviews
- With binary attribute column
- Category wise frequency
- Sentiment Analysis
Final Notes and Output:
- Amenities + Punctuality (OR condition). Please note here, “Punctuality” does not mean bus reached or arrived in time. The review describes punctuality as positive or negative sentiment.
- Rest Stops
Overall, we had a lot of learnings while executing this feature. As we move ahead in our journey to serve our customers, we firmly believe ML would play a very important role. This feature is a very good testament.
You could take a look by downloading our Android App
Stay tuned for more info on our blog.