Get Started with AI in 15 Minutes Using Text Classification on Airbnb reviews

Reid Francis
4 min readNov 7, 2018

--

Build two text classifiers in 15 minutes

Watson Natural Language Classifier (NLC) is a text classification (aka text categorization) service that enables developers to quickly train and integrate natural language processing (NLP) capabilities into their applications. Once you have the training data, you can set up a classification model (aka a classifier) in 15 minutes or less to label text with your custom labels. In this tutorial, I will show you how to create two classifiers using publicly available Airbnb reviews data.

One of the more common text classification patterns I’ve seen is analyzing and labeling customer reviews. Understanding unstructured customer feedback enables organizations to make informed decisions that’ll improve customer experience or resolve issues faster. Sentiment analysis is perhaps one of the most common text classification cross-industry use cases, as it empowers businesses to understand voice and tone of their customers. However, companies also need to organize their data into categories that are specific to their business. This often requires data scientists to build custom machine learning models. With NLC, you can build a custom model in minutes without any machine learning experience.

Training data

To obtain training data, I went to insideairbnb.com and downloaded the ‘reviews.csv.gz’ file from Austin, Texas. This file contains thousands of real reviews from Airbnbs in Austin.

Next, I defined my labels. I decided to build two classifiers one for categorizing the reviews and the other for sentiment. It was best to separate the training data for each and create separate classifiers in order to achieve the highest accuracy possible. The labels I defined are below:

  • Category Classifier: Environment, Location, Cleanliness, Hospitality, Noise, Amenities, Communication, Other
  • Sentiment Classifier: Positive, Neutral, Negative

Both sets of training data only contain 219 rows (examples). That isn’t a lot of examples in the grand scheme of things. However, one of the benefits of Watson Natural Language Classifier is that it works better on smaller sets of examples. Feel free to continue to add to the training data once you have downloaded the file to further improve the accuracy!

Training the Classifiers

In this tutorial, I will be using Watson Studio. If you would prefer to use the API directly, check out the documentation.

Create an instance of NLC and launch the tooling (Note: if you get lost, please refer to the embedded video at the bottom of this post):

  1. Go to the Natural Language Classifier page in the IBM Cloud Catalog.
  2. Sign up for a free IBM Cloud account or log in.
  3. Click Create.
  4. Once an instance is created, you will be taken to the below screen. Click Launch tool to open the tooling in Watson Studio.
Open tooling from IBM Cloud Catalog

Train your classifier

  1. Download the training data. Two columns is all you need! That’s how easy it is to train a classifier in NLC! Download here!
  2. Click “Create Model” to start building your classifier(s).
Begin creating your classifier
  1. Next, you’ll need to create a project in Watson Studio. If you do not have an instance of Watson Studio created then you will need to provision a one on the Lite plan.
  2. After you have provisioned your instance of Watson Studio, refresh the page and name your Watson Studio project. Then click “Create” in the bottom right hand corner.
  3. Upload the training data for either the Categories or Sentiment
  4. Click Train Model (Training will take approximately 5-10 minutes for each classifier)
Uploading training data and training a classifier

Testing your classifier

  1. Now that training is done, you can test your classifier!
  2. Click into your classifier and go to the Test page. Enter any text and see how Watson classifies it. The classifier works best when using actual Airbnb reviews — so test it out with data from insideairbnb.com. If the classifier makes a mistake, simply click Edit and Retrain in the top right corner and add more training examples to your training data. You’ll be classifying Airbnb reviews in no time!

Want to hook your classifiers up to a user interface? Check out the Github repo for the Natural Language Classifier demo. This repo will give you the Node.JS for the NLC demo so you can hook your classifiers up to a simple user experience.

Classify Airbnb Reviews with Watson NLC

Helpful Links

Product Page | Documentation | Sample apps and code | API Reference

Want to see what else Watson can do with Airbnb reviews? Check out the new demo for Watson Discovery Service!

--

--