Vineyard Recommender: Part 5 — Natural Language Analysis

Sonali Surange-Dev
Oct 11 · 2 min read

Learn how to derive features from a natural language using Data Refinery (video)

In Part 1, Part 2, Part 3 and Part 4 of this series, you have learned how to prepare and analyze structured numeric, time series and text data using IBM’s Data Refinery.

Unstructured text data such as product reviews, for example, are important data sources for Data Scientist. Quite often, it is a requirement to combine information from such sources to enrich the end-to-end analysis.

Data Refinery provides Natural Language transformations that tokenize unstructured text into characters, words, sentences, paragraphs. Tokenization can also be done using a regular expression pattern. You can remove commonly known stop words or custom stop words applicable to your data. Tokenization is position-aware, so you can know the source of the token in the document, sentence, etc. Tokenized data can be grouped by documents, sentences, paragraphs and aggregated. Text analysis can be combined with structured data to derive enriched features for analysis.

Photo by David Kohler on Unsplash

In this video, you will learn how to

  1. Tokenize natural language-based reviews into words
  2. Remove common stop words and custom stop words
  3. Filter text data using patterns

Use case

The Data Scientist is tasked by winemakers from small regions, to recommend wineries to visit. The winemakers have provided reviews from sommeliers describing the wine’s taste, smell, look and feel. The data contains WineEnthusiast score for the wine, the price, and the region of the winery embedded within the title of the wine.
The problem the Data Scientist has to solve is — Recommend a regional vineyard using the customer’s preferences for taste, smell, etc.
She has chosen IBM’s Data Refinery tool to perform this task

The Data Scientist has chosen IBM’s Data Refinery tool to perform this task

Video: Vineyard Recommender

IBM’s Data Refinery is available with Watson Studio, Watson Knowledge Catalog on public cloud, private cloud, and Watson Studio Desktop.

Get started for free at:

IBM Watson

AI Platform for the Enterprise

Sonali Surange-Dev

Written by

Lead Architect, Senior Technical Staff Member, IBM Watson Data and AI

IBM Watson

AI Platform for the Enterprise

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade