Machine Learning: Where It All Comes Together

Steven Astorino
IBM Data Science in Practice
5 min readMar 29, 2018

Too often in the past, data science has been a siloed activity that centered around data exploration and insight without truly crossing departmental lines. Those limitations have kept data science from becoming a strategic, enterprise-wide initiative for supporting multiple data science and machine learning projects.

Thankfully, things are changing. Technologies have now matured and become cost-effective enough that an enterprise-class, data science platform can support large scale production requirements. In response, enterprises are directing data science activities across multiple lines of business, using a set of standardized approaches that embrace openness, extensibility, and adaptability. IBM stands at the center of these efforts and our position has led us to some key recommendations:

1. Build a balanced team
- Emphasize programmatic and visual data science
- Enable business domain experts

2. Simplify, accelerate and operationalize
- Encourage quick experimentation to deployment to increase business contribution
- Cross-train your team and focus on velocity

3. Connect data, algorithms and apps
- Offer interactive data access, prep and quality wherever data resides
- Bring tools into one environment

We’re now offering Data Science Experience version 1.2, which continues to address all three recommendations above. To accelerate deployment, the release offers a new Model Management and Deployment interface where DSX administrators can create project releases, deploy assets within those releases, and go live in a production environment.

To encourage team balance and a unified environment, DSX now directly incorporates our SPSS Modeler offering to enable visual productivity, our Decision Optimization offering to enable prescriptive analytics, and our Data Refinery tool for cleansing and shaping data. Together, they help cement the reputation of DSX as the platform of choice for data science.

Let’s consider each one in turn.

SPSS Modeler

In so many ways, SPSS Modeler is the great warhorse of data science. Its first incarnation dates from 1994 and since then it’s shouldered data mining and analytics work across industries from risk management to education to telecom. Its staying power comes from a set of capabilities that let domain experts work deeply with data — without requiring years of programming or math experience.

If you haven’t encountered SPSS Modeler before, consider just a few features:

  • Automated model creation selects which predictive modeling technique matches your business problem.
  • Automatic data preparation handles many of the tedious cleaning and integration tasks.
  • Intuitive GUI makes predictive analytical workflows easy to create, maintain, and manage.
  • A dedicated interface refined over decades makes data visualization a breeze.

As we integrate SPSS Modeler into DSX, it means those same domain experts can now collaborate more directly with expert coders and data scientists. But also, even for the experienced coders, SPSS Modeler offers the chance to do quick proof-of-concept testing and visualizations before diving deeply into models and algorithms, using embedded open-source languages like R and Python.

If you’re already a user of SPSS Modeler, you can import your existing Stream files — or use an example stream to get started. From the node palette, you can drag and drop operations, graphs, modeling, export, or output nodes. To add data as a node, drag a local CSV file or drag in data from a remote data set. Once a stream is ready to go, you can run an evaluation and see the results: model accuracy, predictor importance, and network diagram. You can even get advanced data visualizations by data node. For more detail, we’ve put together this video.

Decision Optimization

With this latest edition of DSX, we’ve also embedded Decision Optimization, IBM’s flagship offering for prescriptive analytics. We often think of data science and machine learning as techniques for describing or predicting activity in the world. Decision Optimization goes further by adding analytics for prescribing better decisions. That might mean better planning, better schedules, better team assignments, or better resource allocation. It’s about fully operationalizing data science and putting it into action inside your business.

Unlike traditional data analytics, decision optimization grows out of mathematical programming techniques that account for uncertainty. Rather than basing recommendations on a single prediction, it uses robust and stochastic optimization to devise thousands (or tens of thousands) of predictions, and optimizes for each one. The result is a set of recommended choices that account in advance for dozens of factors at once — including conflicting business goals, business constraints, and the threat of sudden shifts. This blog post from Dinesh Nirmal takes a deeper look.

Data Refinery

Before you analyze data sets in DSX, Data Refinery lets you cleanse the data to address data that is incorrect, incomplete, improperly formatted, or duplicated. It also offers operations for “shaping” the data, meaning you can filter, sort, combine, or remove columns, and run custom operations — all in near real-time. The options are almost endless: perform mathematical calculations, convert column types, substitute strings, trim characters, remove empty rows, perform joins, and on and on.

The intuitive interface lets data engineers fast-forward through these tedious data preparation tasks that so often pull them away from deeper work — and the result is a customized data flow that you can save either as a separate JSON file or as an R script for automating the refining of any new data that flows into the system. The tool even provides options for quickly validating or visualizing the data as part of the refining process.

Under One Roof

With the increased pace of experimentation, data science teams need a single environment where interaction and evaluation become continuous, and where programmatic and visual skills enable fast progress from exploration to production. In other words, collaborators from across skills and backgrounds need to be able to gather under one roof, each person using his or her tools of choice and each bringing a unique set of strengths to bear.

I invite you to learn more about this exciting release of DSX and experience the combined capabilities of SPSS, Decision Optimization, and Data Refinery.

--

--