Watson Natural Language Processing comes to IBM Watson Studio Notebooks

Use Watson NLP to bring unstructured data into the Data Fabric

Alexander Lang
5 min readJan 20, 2022
Photo by Etienne Girardet on Unsplash

High-Quality Natural Language Processing in more than 20 Languages

Call center records, customer complaints, social media posts, problem reports… text is everywhere in the enterprise and beyond. Most of the time, text data doesn’t stand alone: it’s part of a data record with many structured columns. And now, it’s easier than ever to extract meaning and structure from text and combine it with structured data. We’re launching Watson Natural Language Processing Environments for Watson Studio Notebooks. This capability is available in all Watson Studio plans in IBM Cloud Pak for Data as a Service, including our free trial.

The Watson Natural Language Processing Environment gives you instant access to pre-trained, high-quality text analysis models for over 20 languages in your notebooks. These models are created and maintained by experts in IBM Research and IBM Software and evaluated for quality for each language. The following models are provided out-of-the-box:

  • Syntax Analysis: Identify tokens, their base forms (aka lemmas) and their part of speech. Run dependency parsing to identify the subject, verb, and objects of a sentence.
  • Noun Phrase extraction: Identify noun phrases like “windscreen wiper” or “front bumper” from your text.
  • Keyword retrieval: Rank noun phrases according to their relevance in a document.
  • Entity Extraction: Extract 20 entity types from the text, including Organization, Person, Job Title, Location, Phone Number, Email Address, Measure, Money, Date and Time.
  • Sentiment: Identify the sentiment of a text as positive, negative, or neutral.
  • Tone: Classify a text as excited, frustrated, impolite, polite, sad, satisfied, or sympathetic.
  • Emotion: Classify the emotion expressed in a text as anger, disgust, fear, joy, or sadness.

It bears repeating: these models don’t just work for English. We provide models that work in Arabic, Chinese, Czech, Danish, Dutch, German, English, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Norwegian, Portuguese, Romanian, Russian, Slovak, Spanish and Turkish — and for some models, we support even additional languages.

We placed special attention to the data used to train these models. This includes the quality of the data, the data provenance, and its intellectual property. The value for you: the models we built from this data can be used in your production environment, without worrying about license or intellectual property infringements — unlike some models you may find for open-source text analysis frameworks.

In case our built-in models don’t cover all your analysis requirements — feel free to build your own custom models, using:

  • A high-performance matcher for dictionaries and regular expressions. Thanks to our built-in syntax analysis, you can put lemmas into the dictionary, and we match all inflected forms automatically. So, you add “mouse” — we find “mice”.
  • GloVE, USE and BERT embeddings
  • Custom classifiers with your own training data. You can use SVM, CNN and BERT approaches — or simply create an ensemble classifier that combines various models

You can store these custom models in your Watson Studio project, and reuse in other notebooks.

Integrated into the Cloud Pak for Data Experience

Extracting information from text is just the first step. To deliver insights, you want to integrate the results into your data fabric — and automate the insights going forward. Fortunately, Watson Studio has got you covered:

Turn text analysis into text mining

Once the NLP results are in your notebook, use the full array of “structured” analytics approaches to identify patterns across documents. Or correlate the results with other, structured columns in your data. This yields additional insights that go beyond a single piece of text.

For example, run syntax analysis to identify noun phrases that occur in car problem descriptions, and then use association rule analysis to identify the key noun phrases for a particular car model.

Noun phrases for a particular car model

Or correlate the results of syntax analysis and sentiment analysis to identify the aspects that drive positive or negative sentiment for a particular hotel

Nouns that co-occur with a particular sentiment in reviews for two hotels

Visualize and share insights

Visualize results right in your notebook (like the charts you see above) and share a link to your notebook with your stakeholders or create dashboards to let them interactively explore the results.

Dashboard with sentiment information on hotel reviews

Productize and Automate

Run your notebook as a job to apply text analysis regularly on new data. Use AI Orchestration to combine the notebook with other tools, such as AutoAI.

Use NLP to extract additional information for a customer churn model

Getting Started

Using Watson NLP models is easy as 1–2–3–4:

  1. Start a Python notebook with the Default Python 3.8 + Watson NLP XS (beta) environment
  2. Import the watson_nlp library
  3. Load the block that contains the model you need
  4. Apply the block to your text
Using Watson NLP in a notebook

Our extensive documentation includes a block catalog with all models you can use out-of-the box, and instructions how to create your own models, including code samples.

We also added a new Sample Project in the Watson Studio Gallery: Text Analysis with Watson Natural Language Processing. This project contains sample notebooks that show you how to work with each model, and examples how to further analyze and visualize the results for insights — including the ones from this blog.

We’re looking forward to your feedback — so, just plug in your own data and get going!

--

--

Alexander Lang

Architect in the IBM Watson Studio Team. Experience in Data Science, NLP and Social Media Analytics