Smart Replies in Internshala Chat

Nikhil Anand
Internshala Tech
Published in
7 min readSep 17, 2020

Introduction

Internshala aims to find employers the right talent and communication is an integral part of any hiring process. Candidates with good communications skills and promptness always have an edge over the other applicants. At Internshala, we are trying to improve this communication and promptness by introducing Smart Reply suggestions to students for employer messages in our chat module — a feature that suggests possible replies to employer texts. In this blog post, we will discuss the necessity, objective, impact, and the novelty of our Natural Language Processing (NLP) Recommendation Engine.

Necessity

Consider a situation where an employer posted an internship and received applications for the same. The employer messaged all the candidates asking their interest, availability for interview and the earliest date they can join. However, students even after receiving and viewing the message do not reply to the employer. The actual response rate usually stays at ~42% and this can be due to negligence and/or sometimes the students being dubious about the possible formal reply. These situations not only create an unfavorable circumstance for applicants but also lengthen the hiring process.

Objective

We observed these scenarios can impact the hiring process a lot and came up with a solution where we show (a maximum of) three possible replies for the employer messages to the students. The objective was to improve the response rate of the chat messaging or in simple terms nudging students to reply to employer messages. Three teams at Internshala — Data Science, Product Management and Software Development worked together and came up with a machine learning model that classifies employer’s messages and suggests possible replies by understanding the message intent. These changes will therefore drive a higher response rate from students, positively impacting the hiring cycle.

Identifying the types of responses (Intent Discovery)

In the beginning, we encountered three major questions — How many different responses should we suggest? What should be those responses? What should be the best possible way to determine those responses?

The best way to answer these questions is by clustering chat messages to find the different intent in the messages. However, techniques like K-Means and Hierarchical Clustering may not be applicable as employer messages on Internshala chat can have multiple intents in the same message, whether the message is sent by the employer or the students. So, we decided to use topic modelling to identify those hidden intents.

Let us understand this by an example where an employer messaged an applicant regarding a content writer job posted by them.

“Thank you for applying to the Ukti Content Writer opportunity. Ukti offers targeted content development solutions that align with a brand’s marketing objectives. The company is built on the idea of empowering businesses to deliver a sharp, impactful message to their audience through the power of the written word. At Ukti, you get meaningful content crafted with a specific objective, for a clearly defined audience. Our client base is typically organizations looking for content services & marketing companies, small businesses, and sometimes individuals. The Writer will play a key role in daily operations, eventually managing a growing team of content creators in the capacity of a Manager/Editor. If you would be interested in working in a small yet driven team that strives to create excellent content for different brands, please send your email Id and Phone Number.”

There are few limitations of clustering employer messages.

  1. As seen from the above example, employer messages are open-ended. In a single message, employers not only ask students for their email and phone number but also tell them about their company, their work culture, and the job role. This additional information does not require any reply from students. This additional hidden intent increases with an increase in the number of messages and which eventually made it hard to decide the value of K for topic modelling.
  2. These employer messages have long structured sentences. The structure is lost while using LDA since it is based on a bag of words. These two major limitations inspired us to cluster students’ replies instead of employer messages. The replies by students messages are concise in comparison to employer messages and mostly comprises a few hidden intents.
Figure 1. Visualization of identified topics using PyLDAvis

Labeling messages

By using LDA, we clustered students’ messages and identified numerous topics. Since the clustering is an unsupervised learning process, we cannot control the quality of the clusters. These clusters further required human intervention to extract the topics that are relevant to our product and can be recommended in the messaging platform. Identifying tags and assigning tags corresponding to these topics was a very rigorous task. Few iterations were carried out by independent observers to identify plausible tags to produce recommendations. Once the tags are figured out from the messages, we simply mapped those tags with the corresponding employers’ messages.
One segment of the system is now carried out. Another segment of the process was to create a supervised machine learning method that can be trained on the employer messages along with the tags obtained through topic modelling. This task (topic modelling) saved a lot of human effort that would have been required to tag data on such a large scale.

Classification using deep learning (CNN + LSTM)

Since we all know the supremacy of deep learning over machine learning algorithms when the amount of training data is enormous, we decided to use deep learning for classifying the messages based on the assigned labels (labels obtained by topic modelling). For natural language classification tasks, LSTM (Long-Short term memory) and CNN (Convolutional neural networks) are the most preferred choices. Influenced by Sainath et al. 2015, we decided to use this unified architecture taking advantages of the specialized architecture altogether. Evaluation of using this unified LSTM, CNN and DNN architecture for our model proved to be better than LSTM or CNN alone. This architecture has an improvement in accuracy by 2% over other amalgamations of specialized layers.

Figure 2. Stacked LSTM and CNN architecture

The pre-processing task involved identification of all the possible entities that can occur in the conversation of an employer and employee such as — Date, Time, Phone Number, Location etc. These entities were marked by different identifiers.

Since the chat messages from the employers can have multiple intents, there could be a possibility that there could be multiple possible replies for the messages. So, for validating the model instead of simply using accuracy, we have used top 3 accuracies. Let us understand this through sample training data.

“Dear Candidate, Congratulations! In response to your application, we will be taking your application further to the final telephonic Interview. Please find attached herewith document for your reference (product details), kindly go through the same thoroughly. Our team will contact you within 2–3 days. Feel free to reach out for any queries.”

The employer notified a student about her selection for further round and asked the candidate to go through the attached document in the message. The top 3 model recommendations for the above messages in the order of their probabilities (highest to lowest) —

  1. Thanks for shortlisting me
  2. Sure
  3. Okay

For the message, all the three suggestions are relevant, but since the data was mapped with just the response (okay in this case), the prediction would have been inaccurate if we had simply used accuracy score.

Productionization

For the deployment of the model, we have used Flask, a web application framework written on python. It is easy to use and it’s a built-in development server makes it the most preferable choice. A real-time message processing and generation of replies are shown below.

Figure 3. Steps for identifying Smart Replies for incoming messages at runtime

Impact

The objective of smart chat reply was to increase the response rate and strengthen the engagement between employer and applicants.

On manual evaluation, we observed that the final model was able to suggest replies for ~85% of the chat messages with top 3-accuracy of about 95%.

After implementing the final model, we achieved a jump of approximately 35% — a significant jump in the response rate from the applicants which will go a long way in improving the hiring experience on Internshala.

Figure 4. The improvement in response rate over the last 4 months

Conclusion

We have developed an NLP recommendation model which is a fusion of unsupervised and supervised machine learning model. While building this feature we realized the effectiveness of text clustering — the unsupervised clustering of text minimized the human element in the machine learning loop to a great extent, which consequently saved both our time and money. Experimenting with stacked specialized layers (CNN and LSTM) further improved accuracy. Both quantitative and qualitative evaluation of the model showed the improved results by stacking layers.
There is further scope of improvements. In future, we will try to reiterate the modelling by using pre-trained embedding, or using bi-directional layers, or adding attention layers.

Acknowledgement

This work is a collaboration with Purnima Kaushik, Sumit Chahal, Shubham Singh, Vikram Shah and Kishalaya Kumar. I would also like to thank Aseem Garg, Amar Kumar, Venkatesh Gupta, and Prashant Kumar for their valuable help and insights for this project.

--

--