Learn how to derive features from a natural language using Data Refinery (video)
Unstructured text data such as product reviews, for example, are important data sources for Data Scientist. Quite often, it is a requirement to combine information from such sources to enrich the end-to-end analysis.
Data Refinery provides Natural Language transformations that tokenize unstructured text into characters, words, sentences, paragraphs. Tokenization can also be done using a regular expression pattern. You can remove commonly known stop words or custom stop words applicable to your data. Tokenization is position-aware, so you can know the source of the token in the document, sentence, etc. Tokenized data can be grouped by documents, sentences, paragraphs and aggregated. Text analysis can be combined with structured data to derive enriched features for analysis.
In this video, you will learn how to
- Tokenize natural language-based reviews into words
- Remove common stop words and custom stop words
- Filter text data using patterns
The Data Scientist is tasked by winemakers from small regions, to recommend wineries to visit. The winemakers have provided reviews from sommeliers describing the wine’s taste, smell, look and feel. The data contains WineEnthusiast score for the wine, the price, and the region of the winery embedded within the title of the wine.
The problem the Data Scientist has to solve is — Recommend a regional vineyard using the customer’s preferences for taste, smell, etc.
She has chosen IBM’s Data Refinery tool to perform this task
The Data Scientist has chosen IBM’s Data Refinery tool to perform this task
IBM’s Data Refinery is available with Watson Studio, Watson Knowledge Catalog on public cloud, private cloud, and Watson Studio Desktop.
Get started for free at: https://www.ibm.com/cloud/data-refinery