Getting the Best-Personalized Deals: Part 3 — Data Preparation (Video)

Sonali Surange-Dev
2 min readSep 18, 2019

--

Part 3: Using Data Refinery, learn how you can prepare data to optimize the experience of shopping for shoes.

Personalized shoe deals

Read Part 1 to get an introduction of how Data Scientists can productively prepare data and build analysis on test data, using Data Refinery.

Read Part 2 to learn how you can automate the analysis for real-world data.

Data is rarely made available in an analysis-ready format. For example, fields contain delimiters, headers are quite often missing. Dates and timestamps come in a variety of formats. Not all data is numeric. Text fields contain valuable information that can be used for the analysis. Data Scientist needs to go through the tedious process of configuration, normalization and feature extraction. In addition, to make the analysis usable, Data Scientists have to combine data from multiple streams.

Data Refinery provides the ability to visually configure your delimiter separated data. You can normalize dates and timestamps and extracts parts of interest, from a variety of formats, using the user interface in Data Refinery. You can also leverage text and pattern-based transformations to extract features from text data. You can use relational transformations on the data to such as left, outer, full, inner join to enrich your data. In addition, you can also use filtering joins to narrow or expand the scope of the data.

In Part 3 of this series, you will learn how you can use Data Refinery to:

  1. Configure messy data to specify headers, delimiters, escape characters
  2. Normalize, transform and extract features from dates and text
  3. Personalize the analysis by combining data from an external dictionary

In this video, we will be using Data Refinery to optimize the experience of shopping for shoes.

We will walk through a use case where the Data Scientist wants to find the best time to shop for her preferred brand of shoes. She has data about shoe discounts offered overtime for all brands. She has a list of preferred brands, which may change over time.

IBM’s Data Refinery is available with Watson Studio, Watson Knowledge Catalog on public cloud, private cloud, and Watson Studio Desktop.

Get started for free at: https://www.ibm.com/cloud/data-refinery

--

--