Getting the Best-Personalized Deals: Part 3 — Data Preparation (Video)

Sonali Surange-Dev
Sep 18, 2019 · 2 min read

Part 3: Using Data Refinery, learn how you can prepare data to optimize the experience of shopping for shoes.

Image for post
Image for post
Personalized shoe deals

Read Part 1 to get an introduction of how Data Scientists can productively prepare data and build analysis on test data, using Data Refinery.

Read Part 2 to learn how you can automate the analysis for real-world data.

Data is rarely made available in an analysis-ready format. For example, fields contain delimiters, headers are quite often missing. Dates and timestamps come in a variety of formats. Not all data is numeric. Text fields contain valuable information that can be used for the analysis. Data Scientist needs to go through the tedious process of configuration, normalization and feature extraction. In addition, to make the analysis usable, Data Scientists have to combine data from multiple streams.

Data Refinery provides the ability to visually configure your delimiter separated data. You can normalize dates and timestamps and extracts parts of interest, from a variety of formats, using the user interface in Data Refinery. You can also leverage text and pattern-based transformations to extract features from text data. You can use relational transformations on the data to such as left, outer, full, inner join to enrich your data. In addition, you can also use filtering joins to narrow or expand the scope of the data.

In Part 3 of this series, you will learn how you can use Data Refinery to:

  1. Configure messy data to specify headers, delimiters, escape characters
  2. Normalize, transform and extract features from dates and text
  3. Personalize the analysis by combining data from an external dictionary

In this video, we will be using Data Refinery to optimize the experience of shopping for shoes.

We will walk through a use case where the Data Scientist wants to find the best time to shop for her preferred brand of shoes. She has data about shoe discounts offered overtime for all brands. She has a list of preferred brands, which may change over time.

IBM’s Data Refinery is available with Watson Studio, Watson Knowledge Catalog on public cloud, private cloud, and Watson Studio Desktop.

Get started for free at:

IBM Watson

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store