Building Machine Learning Models with MonkeyLearn

Phong Vu
RingCentral Developers
10 min readSep 17, 2019

Communication is an integral part of businesses, not only internally, but also externally, in how they communicate with the customers and partners. Consequently, it’s essential to work with a communication system in place to achieve this successfully. Having the correct communication system will consequently create effective communication between employees, clients, and stakeholders, improving customer service and as a result, customer engagement.

However, with time and growth of the business comes new challenges. Customer queries start piling up and even having a successful communication system sometimes is not enough to manage the new flood of enquiries. Not only support teams need to handle this growth while delivering a quality service, but customers are becoming more and more demanding, and want answers right away. To illustrate, 80% of business buyers said they expect companies to respond to and interact with them in real-time.

But don’t panic, artificial intelligence, most specifically machine learning is here to help. By using this technology, you’re going to be able to automate certain processes so that your customer service team can do more.

For example, you can use machine learning to build an effective virtual voicemail assistant for your telephone customer services. You can train a model to detect spammy voicemails, auto-reply to a caller with an SMS message, and even detect urgency of a voicemail, saving your customer service team countless of hours and making them more efficient.

After reading this article, you’ll not only learn about what machine learning can do for your business, you’ll also be able to build a machine learning model using MonkeyLearn. Go ahead and keep reading to learn how to do it — below are the sections if you want to go to something specific:

  • Getting started with MonkeyLearn
  • How to build an accurate model?
  • How accurate a model can be?
  • What algorithms are used for training models?
  • Use cases and applications
  • How MonkeyLearn handle data security?

Let’s get started!

Getting Started with MonkeyLearn

MonkeyLearn is a platform that makes text analysis with machine learning easy and accessible for everyone, not only for data scientists. It’s built to analyze huge amounts of data automatically and efficiently, saving businesses time and resources to do it manually.

With MonkeyLearn, you can use two types of models to analyze your data automatically: classifiers and extractors. On the one hand, Text classifiers are used to group data into a defined tag or category (by sentiment, topic, urgency, etc.). On the other hand, text extractors are used to identify and retrieve pieces of information present in text (for example keywords, entities, prices, dates, etc). By combining classifiers and extractors companies can automate processes, get insights from data, and save time processing data.

MonkeyLearn provides great tools to Build Custom Models

To illustrate how businesses are making use of MonkeyLearn, here are some of the most popular use cases:

  • Customer service: automatically tag your support tickets based on topic, issue, sentiment, or intent. By doing this, you can automatically route the ticket to the right person, prioritize what to answer first, and improve reporting.
  • Customer feedback: automatically tag feedback based on topic, aspect, intent or sentiment. This will allow you to analyze huge amounts of feedback, get key insights from data, and improve decision making.

Now you know what Machine Learning can do, but how does it work?

Machine learning algorithms learn by experience, so in order to perform a certain task, they need to be trained how to do it with data. For example, if you want a machine learning model to detect spammy voicemails, you’ll need to give examples of both ‘regular’ and ‘spammy’ voicemails to the machine. Once it has seen a certain amount of examples, the model will be able to effectively discern spam content from the regular content and begin making predictions on new voicemails.

How to Build an Accurate Machine Learning Model

If you want to create a custom model in MonkeyLearn, you’ll have to train it to be able to perform its predictions. Here are some of the best practices to follow in order to train an accurate custom model:

1. The more data you use for training a model, the smarter the model will be

The amount of data you’ll need to create an accurate model depends on each particular case. But as a general rule, the more training samples, the better. Machine learning algorithms learn from the data you feed, so naturally, the more information you give to the model, the smarter it will be.

For instance, to obtain accurate results in topic detection you’ll need about 250 examples per category or tag, whereas in sentiment analysis you’ll need around 500 examples per tag (e.g. positive, neutral, and negative).

In the case you want to create a model that detects spammy voicemail or a model that detects the urgency of it you’ll need about 100 to 300 examples per tag to start seeing good results.

2. Quality of the data is more important than volume

Even though the quantity of the data is relevant, keep in mind that, in this case, the quality is even more important. It’s preferable to feed the algorithm with less, but high-quality training samples, than feeding the model with thousands of examples that have no valuable information for the model.

For example, if you’re creating a model to detect the urgency of voicemails, you should feed the machine with the different ways to express urgency by customers. If you only manage to train the model with examples that just mention things like ‘ASAP’ or ‘Please Help Me Now’, the model may miss other situations that you’ll also consider urgent (for example, a customer gently asking for a refund or reporting a bug).

3. Define tags that can be used consistently

Tags should always have a unique and specific definition. Define each tag with clear guidelines and make sure there are no overlapping concepts between two of them. Tagging your data inconsistently for training your model causes confusion and significantly affect the accuracy of predictions.

4. Keep tags to a minimum. Remove tags that are too small or too niche

Aim to stick to a maximum of 10–15 tags per model. Having more tags than that will not only make the tagging more inconsistent over time, but also it will be more time-consuming to tag data for training the model.

As an example, if you’re categorizing voicemails based on the topic their content, don’t create niche tags like “mobile performance”, “app speed”, or “desktop loading times” that only apply for a small amount of voicemails. Instead create a single broader tag like “performance” that can group these kinds of voicemails. With niche tags, the machine won’t be able to learn correctly, as there wouldn’t be enough examples or information to learn from.

5. Use a single classification criterion per model

Create one classifier per challenge you want to solve. If you are tagging voicemails based on different criteria, just separate them into two different custom fields.

For example, if you want to tag voicemails based on their sentiment (positive, negative, or neutral) and on their urgency (urgent, not urgent), make two different models for each task. Combining both tasks into a single model will confuse the model and affect its prediction capabilities.

6. It’s an iterative process

Creating a reliable and accurate machine learning model is an iterative process. You start with a small model that only ‘understands’ particular type of voicemails. Then, you add more examples to improve certain predictions. Afterward, you detect certain edge cases that the model makes mistakes and you work with the existing training data to improve these predictions. Next, you adjust the parameters and start fine-tuning the model for specific situations. And so on.

Keep in mind that a machine learning model can always be improved. You should continuously keep feeding the model with more and better examples to get the best results. If you just stick to the data you initially fed the machine, the learning process will end there and the model will not become more accurate or even learn from a new type of voicemail that you might receive over time.

How Accurate a Model Can Be?

If you follow these best practices, you’re probably wondering exactly how accurate the model can become. Well… there is no simple answer to this question. It depends on each particular case, however, provided the dataset is clean, criteria is well-defined and the tagging of the data is consistent, you might get to F1 scores over 90%. The F1 score is the statistical accuracy of the model, so naturally the higher the F1 score, the better.

Build a Custom Model and Check the Accuracy from Statistics

The accuracy of the model will also depend on which algorithm was used to create the model. In the following section, we’ll go through the different options you’ll have.

What Algorithms Are Used for Training the Models?

When creating a custom model in MonkeyLearn you’ll be able to choose which algorithm to use to train the model. There is no right or wrong way to go here, each algorithm works better in different situations. Depending on whether you’re creating a classifier or extraction model, your options will be the following.

For custom classifiers, you can choose between training your model with Naive Bayes or Support Vector Machines (SVM) algorithms.

Naive Bayes is a simple, fast, accurate, and reliable solution, that works especially well with natural language processing problems. Naive Bayes takes advantage of Bayes’ Theorem and probability theory to predict the tag of a text. It is a family of probabilistic algorithms that for a given text (input) calculates the probability of each tag (output), and decides the outcome based on the highest probability.

Support Vector Machines is an algorithm that works particularly well with a limited number of data, being faster, and having better performance than other algorithms. We recommend using this algorithm when the data is linearly separable so you can quickly classify data.

For custom extractors, you can only use the default algorithm which is Conditional Random Fields (CRF), an algorithm with a statistical approach that contemplates the context and relationship to make predictions. This algorithm can create really complex patterns between words and data than a REGEX and has the ability to generalize from a small amount of information.

Choose an Algorithm to Train a Model

Use Cases & Applications

By now, you have learned how to train a model and picked up some best practices on what to do to get accurate predictions. But how can all of this be useful for your business?

Machine learning can help get you to get key insights from your data and automate all kinds of processes. For example, you can use machine learning to create a smart voicemail assistant that can do some of your work for you, such as:

  • Automatically tag new voicemails based on topic, issue, sentiment, or intent.
  • Routing the voicemails to the right team member.
  • Detect the urgency of a voicemail, so you’ll be able to prioritize, answering the most urgent ones first.
  • Analyze your voicemail to discover insights on what people are talking about, which can be used as a resource for decision making.
  • Create AI-based auto-responses and response suggestions for voicemails to save time when giving an answer.

How MonkeyLearn Handles Data Security?

At MonkeyLearn, we are aware that businesses are not only trusting us to add value to their data but also to keep it secure. We take extreme measures to maintain that security, including:

  • Physical security: Our services are hosted in Microsoft Azure which keep state-of-the art physical security, including 24x7x365 surveillance, environmental protections, and extensive secure access policies.
  • System security: our servers run in recent Linux OS releases with long term support policies and are regularly updated. Our web servers communicate over HTTPS (TLS 1.2) to protect requests from eavesdrop and man-in-the-middle attacks. We use 2048 bit RSA SSL certificates, signed with SHA256.
    Our engineering team monitors and logs errors using top-notch tools like Datadog and Sentry. We also have strict privacy policies and a testing infrastructure to ensure data privacy within our applications. Moreover, we run and securely store daily data backups.
  • Operational security: The development and maintenance of our platform is run by our highly trained engineering team. User data can only be accessed by a small number of authorized employees. Accessing user’s accounts by MonkeyLearn employees is only allowed in exceptional cases, always with your prior permission and for the purpose of resolving a specific issue only.

Wrap-up

Companies grow and evolve over time. As soon as the number of customer queries start to scale, and the expectations start to arise, you’ll need to provide your customer service team with the right tools to stay on top of their workloads.

That’s where machine learning can play a crucial role. By automating processes that are time-consuming, customer service teams can focus on the important things. Instead of wasting time routing voicemails or support tickets to the right team or monitoring incoming customer queries to detect urgent issues, agents can work on what’s more important; solving issues and delighting customers.

If you want to get started with MonkeyLearn, you can request a free demo here. And if you want to have a go with RingCentral’s smart voicemail assistant, you can sign up for a free developer account here.

--

--