Understanding Our User Pain Points with Text Data Mining

We used natural language processing (NLP) and analytics to learn what are the most common reasons that signing agents failed to meet expectations during signings.

At Snapdocs, we’re trying to simplify and bring transparency to mortgage closings. Check out our previous post on how our online platform provides a centralized gathering point for all parties involved in the closing.

The Frustrations of Mortgage Closings

The final signing on a property is one of the last steps in purchasing real estate and should be an exciting event for a home buyer. However, there are numerous opportunities for complication that can throw the final signing off track. From deal-breaking issues like failing to sign critical disclosure forms to minor irritations such as having a signing agent (a notary) show up late to a signing, many things can lead to bad experiences and frustrations for Escrow/Title Officers and their customers, the home buyers.

At Snapdocs, we strive to make mortgage closings run as smoothly as possible. One way that we do this is by surfacing the best qualified notaries for a signing. We are constantly collecting feedback on notary performance. Just like Uber wants to know if your driver had a clean car or drove aggressively, we want to know if your agent dressed professionally or was late to the signing.

As data scientists at Snapdocs, we wanted to leverage this feedback data to gain insight into how well our notaries have met the expectations of Escrow/Title officers. In this blog post, I’ll first show you how we analyzed our data to determine how often mortgage professionals have unsatisfactory experiences with notaries during signings. I’ll then show you an approach we took using natural language processing (NLP) and analytics to learn what are the most common reasons that notaries failed to meet expectations. Finally, I’ll conclude with how we’re incorporating this information back into our product development lifecycle to improve the overall user experience.

Gathering Data on Notary Performance

On the Snapdocs platform, we enable companies to easily search a database of 65,000+ notaries, see statistics on a particular notary’s past performance, and choose the top ranked, available notary for a signing. After a signing has completed, an Escrow/Title Officer can rate a notary’s performance as positive, neutral, or negative and then include an optional detailed comment.

I looked at over 220K orders from the period of March 2017 to mid-August 2017 and found that an overwhelming 99% of signings received positive feedback ratings, 0.2% received neutral feedback ratings, and only 0.8% received negative feedback ratings! Our notaries are doing exceptionally well at meeting expectations.*

*We also found that about 40% of the signing events that received a negative rating were carried out by notaries that received a negative rating on at least one other signing. Perhaps these notaries are repeating their mistakes. In any case, our hope is that we may be able to prevent bad experiences like these in the future.

Topic Modeling of Negative Feedback Text

While negative feedback on the Snapdocs platform is rare, I decided to delve more deeply into the reasons why our customers were having these unsatisfactory experiences. Looking specifically at the text data from negative feedbacks, I sought to identify common themes using NLP techniques. I used one of the simplest approaches for topic modeling, K-means analysis, to group feedback into clusters based on the similarity of words used.

To do this, I took a standard approach for pre-processing text data. I first used the NLTK package to remove non-alphanumeric characters and tokenized the feedback text, breaking them down into their constituent words. I then filtered out stop words, which are commonly used words with presumably little meaning (e.g. ‘the’, ‘is’, ‘are’). The remaining words were then stemmed using the Porter Stemmer to reduce them to their basic form (e.g. “update”, “updates”, “updated”, and “updating” all reduce to “updat”) to facilitate identifying overlapping words. The collection of these processed words were then converted to a TF-IDF vector of weighted term frequencies so that each feedback text could be represented as a mutually comparable vector for the k- means algorithm.

For k-means analysis, a number of clusters or topics, k, must be specified. Choosing the best value of k can be more of an art than science, often with heuristics being used. I decided to begin with a large and diverse set of topics to work with that I could then later prune, so I chose an initial value of 35 clusters. After analyzing the results and unifying smaller similar clusters together, I settled on 30 final clusters. Each cluster was then manually labeled after taking into account the most common words in the cluster and reading example feedback content from the cluster. For example, from the word cloud (showing the most common words) and the example feedbacks for the first cluster shown below, we concluded that this cluster of feedbacks is about the notary being late to the signing.

Not All Negative Feedbacks Are Equal

On our platform, if a negative rating is given to a notary after a signing, the Escrow/Title Officer also has the option to deactivate a notary so that they are no longer considered in their future notary searches. To understand which clusters of negative feedbacks were inconvenient versus those that were deal-destroying, we looked at which feedbacks led to notary deactivations.

As you can see from the first collection of topics (in pink) in the chart above, making a mistake during the signing (an “Error”) is usually not enough to deactivate a notary. While minor errors make up the biggest group of negative feedbacks, only 18% result in deactivation of the notary. Talking to some of our customers in user interviews, we learned that general errors are usually correctable: you can eSign an addendum or even have the notary go back with the single page needing a correction.

We found a second collection of topics (in orange) that related to quality and promptness of documents, fax, and scans being sent back to the Escrow/Title Officer after the signing event. These were generally more rare, but had relatively higher deactivation rates (up to 39% deactivation rate for having documents missing, “Doc/Fax/Scan — Missing”). Loan closings have a very tight timeline to be completed, so even a day delay by a slow notary can put a deal at risk.

Most interestingly, the last collection of topics (in green) have deactivation rates as high as 71% and appear to include feedbacks from the worst signing experiences. Some topics in this group include notaries that either were poorly presented (“Rude”, “Unprofessional”) or put the signing in jeopardy (“No Show”, “Changed Time”, “Late Cancellation”).

Our main takeaway from this analysis is that mistakes happen and are forgivable, but people seem to have very low tolerance for jerky behavior.

Impacting Product Development

This sort of analysis helps Snapdocs in two ways. First, we can share these learnings and specific feedback with notaries directly. We’re evaluating the best place to incorporate these results in notary training, early warnings to notaries via text/email, and category-specific scoring on our platform. We want notaries to understand what their customers care about and how they can best be successful. We also want notaries to know when they are underperforming, so that we can guide them to take the necessary steps to continue doing business on the Snapdocs platform.

Second, now we have some real data representing problems for us to solve and ideas for features we could build. We’ve started building a feature we’re calling the “Signing Checklist” which should improve the likelihood that notaries walk away from the signing event confident they did a good job. We’ll be building better scheduling and communication tools so everyone –notary, escrow agent, buyer — is on the same page and there are no surprises.

While the engineers embark on the development process to incorporate our learnings from this analysis of negative feedbacks, we’re going to tackle the next data challenge: positive feedbacks! What sort of things do notaries get praised for? Which categories are highly associated with becoming a “favorite” of a company? What does “above and beyond” mean with respect to notary performance at the signing table?

Come Join Our Team. We’re Hiring!

Find our work interesting? Think you might have an alternative approach? We’re hiring. Come join our team!

Snapdocs Product & Engineering Blog

Powering Homeownership

Jon Tang

Written by

Jon Tang

Data Scientist

Snapdocs Product & Engineering Blog

Powering Homeownership