Introducing Classifications in Watson Natural Language Understanding
Natural language processing is fast becoming one of the most used AI tools across industries. There is a large variety of tasks in the field of natural language processing (NLP) that serve many use cases. Classifying text into groups is one of the most popular NLP tasks.
Classification is the task of analyzing input text and assigning predefined labels to this text. Labels can be of any type depending on the application. For example, they can be “spam” and “ham” in the classic spam detection use case, or they can be “positive” and “negative” for a sentiment use case.
IBM’s Natural Language Understanding (NLU) provides variety of NLP capabilities to analyze text in many different languages.
To support large varieties of text classification use cases, NLU has now extended its capabilities and introduced a new classifications feature. This feature allows users to perform multi-label classification by training a model with their own data and using it along with any other features in the analyze API. In multi-label classification, each input can belong to more than one class. In other words, NLU’s classifications feature uses specialized algorithms that can predict multiple mutually non-exclusive labels.
Text classification can be used for popular use cases, like classifying emails, support tickets, resumes, and reviews, among many other things.
In this article, we will go through the popular example of spam detection to show how easily we can train a custom model with this new feature.
Make Classifications model using NLU
- Provision a NLU instance by visiting IBM Cloud Catalog or use your existing NLU instance.
- Copy the credentials generated after the provisioning, and keep them somewhere safe.
- For demonstration purposes, we will use the spam-ham dataset available here. Download the
SpamHam-Train.csv
file. - To train a classifications model using NLU, we will use the curl command line tool.
- Open a new terminal window, and go to the folder that contains the dataset downloaded above.
- Using the dataset downloaded above, create a classifications model with following command:
- The above curl request will return a JSON response, which will contain a field called
model_id
. We will use thismodel_id
to refer to this model later on. - Let’s check the status of our model training with the following curl command:
- The above curl command will return a response that will show us the status of our model training (note the “status” field in the following response).
{
"name": "Spam-Ham Classification",
"user_metadata": null,
"language": "en",
"description": "Demo spam detection model",
"model_version": "1.0.1",
"version": "1.0.1",
"workspace_id": null,
"version_description": null,
"status": "training",
"notices": [],
"model_id": "<CLASSIFICATIONS-MODEL-ID>",
"features": [
"classifications"
],
"created": "2021-07-23T06:35:55Z",
"last_trained": "2021-07-23T06:35:55Z",
"last_deployed": null
}
The value of status = training
shows that our model is getting trained currently. Once the training is over, the model will automatically get deployed to NLU, and the status will change to available
as soon as it is ready to use.
Tip: Training the classifier can take some time. Meanwhile, you can explore more about classifications (or other NLU capabilities) in the documentation, or check out this cool notebook explaining how to use this feature in Python.
- Once the model's status shows as
available
, we can start using it in our application using the Analyze API. - Let’s try to make an analyze request using the following spam text, and see what our model predicts.
Urgent! Please call 09061213237 from a landline. 5000 cash or a 4* holiday await collection. T &Cs SAE PO Box 177 M227XY. 16+
- The following command shows how we can use this model for prediction using
curl:
- The above
analyze
request returns the following response:
{
"usage": {
"text_units": 1,
"text_characters": 125,
"features": 1
},
"language": "en",
"classifications": [
{
"confidence": 0.974358,
"class_name": "spam"
},
{
"confidence": 0.024414,
"class_name": "ham"
}
]
}
This shows that our custom classifications model was able to mark the given text as spam with high confidence.
Conclusion
Watson Natural Language Understanding’s classifications feature provides a scalable and powerful text classification solution. It allows users to train a custom text classification model using their own data in just a few steps.
Let us know how you would like to use the classifications feature in your use cases. Check out documentation for more details.
Sign-up for Watson Natural Language Understanding here and try out classifications today!