Data Refining Simplified: Part 2- Working with Real-World Data to Automate Workflows (Video)

Sonali Surange-Dev
2 min readSep 5, 2019

--

To understand how to visually build an analysis flow on test data, read Part 1

IBM’s Data Refinery accelerates the end to end experience of refining data from development to production. The process starts analyzing test data, then ensuring it can be automated on larger real-world data.

Data Refinery allows reuse of the analysis built on test data, for real-world data. As may happen in the real world, the schema may change slightly ( e.g. changes in column names, new columns added), and data may have more variation.

Data Refinery’s built-in intelligence highlights any impact on the analysis caused by the differences in the original and new data source. Users can edit or delete transformations to optimize the analysis. They can insert new transformations anywhere in the flow, to handle data variations. As the Data Scientist makes their flow ready for automation, Snapshot views provide visual step by step feedback and validation of the analysis.

Visual impact analysis detection and fix

Automation of the analysis is a critical part of data science. Data Refinery allows automating the analysis on an hourly, weekly, monthly basis. An essential part of handling large data is the flexibility to personalize the runtime based on workload. Data Refinery provides flexibility to run analysis on customized runtimes.

This video demonstrates how a Data Scientist can

  1. Reuse an existing analysis on a variety of data sets. Intelligence in the tooling that identifies the impact of this change and features to visually fix the analysis.
  2. Automate the analysis on an hourly basis.
  3. Use a personalized runtime to match the workloads that run during the automation.

Use case:

The Data Scientist has built an analysis flow to identify the best providers of flight tickets online, based on search result test data (Part 1). Her next task is to ensure the analysis is validated on real-world data. Finally she needs to run the analysis on an hourly basis. She has chosen IBM’s Data Refinery tool to perform this task

Video: Working with real-world data and automating workflows

IBM’s Data Refinery is available with Watson Studio, Watson Knowledge Catalog on public cloud, private cloud, and Watson Studio Desktop.

Get started for free at: https://www.ibm.com/cloud/data-refinery

--

--