Accelerate your Data Science Life Cycle using IBM Watson Studio 2.0

6 min readMay 7, 2019

By Armand Ruiz, Stevan Slusher, Adam Massachi, Greg Filla, Julianna R. Delua, Yin Chen and Vishnu Alavur Kannan

With IBM® Watson™ Studio, IBM has been ushering its customers into the era of AI. We from offering management have been working diligently to fortify and build upon the power of Watson Studio in order to further accelerate the multi-cloud AI journey, which enterprises across the globe have embarked on.

Now, it is my distinct pleasure to introduce the next phase of Watson Studio: Watson Studio 2.0.

**Watson Studio Local 2.0 — Project Dashboard**

If you deploy Watson Studio in your enterprise today, you’re certainly aware of the capabilities it provides:

To develop algorithms and train your models where your data lives
Increase team productivity bringing your scientists (both visual and programmatic) to drive your initiatives from discovery to production
Accelerating your data science life cycle, delivering workflow-driven data science, enabling Data Science Center of Excellence across your enterprise.

**Accelerate your Data Science Life Cycle** — From Ideation, Exploration, Discovery & Validation to Production

With Watson Studio V2.0: Bringing talent, process and platforms together, we have advanced our capabilities to accelerate your data science life cycle:

Explore your data at scale: Simplify and expedite data preparation

To date, working with over 85+ customers across different domains, it’s clear to me that 90% of the market is in the data science and business analytics (DSBA) segment and has not yet turned their focus to advanced machine learning and deep learning techniques to realize value; with majority of the time being spent on challenging and lengthy data prep process.

Connectors: With Watson Studio V2.0, we have advanced our capabilities to perform data exploration at scale:

Adding 43 data connectors like Dropbox, Salesforce, Tableau or Looker.
Asset Browser experience to navigate through Schemas, Tables and Objects.

Data Source Connections — Connect and run analytics where your data lives

Data Prep and Data Exploration:

With Data Refinery:

Scientists can collaboratively review, refine and augment data sets visually
Applying over 150+ advanced operations on data (to highlight a few below) with automation to run these reproducible refinery flows.

**Joins in memory** using Spark runtimes

Perform **over 150+ Operations** using Data Refinery and drive data pipelines using Data Refinery Flows

Visualize data in no time without writing a line of code at massive scale:

**Preview, Profile and Visualize your data** using Data Refinery @ SCALE

Running analytics at scale where your data lives:

With Execution Engine for Hadoop:

We have enhanced our integrations with Hadoop Distributions (CDH and HDP), to run analytics where the data lives leveraging your existing compute, with improved performance, security and scale while maintaining quality and stability.
Programmatically leverage Pyspark within notebooks to massively train on Spark nodes

**Execution Engine for Hadoop** to seamlessly integrate with your Cloudera or Hortonworks clusters

Jobs and Scheduling:

- Built in batch and evaluation job management for Python/R scripts, SPSS streams and Data Refinery Flows in a single UI with multiple scheduling options.

**Schedule your batch predictions & evaluation jobs** with environment variables & command line arguments

Enhanced focus on citizen scientists:

We have enhanced the IBM SPSS® Modeler as part of Watson Studio, improving the data exploration and visual modeling including integrated data mining and SQL pushback.
Scientists can add several NEW nodes [Auto Data Prep and Auto Modeling, enhanced Data Visualization, Data Refinery integration and push to production], including 28 new features supporting data preparation and 44 capabilities (feature extraction and modeling techniques) to accelerate Machine Learning.

**Auto-Prep, Feature Selection, Extraction and Engineering** **applying ML techniques** via SPSS Visual Modeler

Enable governed collaboration:

With our governed collaboration, we’ve added a collaborative interface, similar to Slack, to our Jupyter Notebook integration. You can add notes within the tool segment itself as you have questions and as you discover new techniques you want to apply. This allows you to collaborate right where the discovery happens via governed collaboration guarding sensitive initiatives.

Openness with Enterprise Open Source:

To support programmatic scientists, when writing code:

- Scientists can import open source packages and libraries or bring their own.

- We make it easier for teams using open source tools while indemnifying packages and libraries bundled with the platform with appropriate security engineering practices to ensure machine learning pipelines are reliable and robust.

- Our entire platform is built using Kubernetes for orchestration, docker for containerization through instrumentation following infrastructure as code principles, with enhanced DevSecOps engrained with governance to support large enterprises.

Reproducibility and Versioning:

We’ve also enhanced version control integrations to make accessing and editing of different versions of assets more seamlessly considering reproducibility from the ground up:

- Support for major GIT frameworks — Github, Github enterprise, BitBucket and BitBucket server.

- No Vendor Lock-in, customers can download every asset they build using IBM Watson Studio and Watson Machine Learning by exporting their assets to their laptops or to GIT

Export your assets with **NO Vendor Lock-in** to your laptops or to your enterprise GIT framework

Publish & Review results during discovery via instant feedback loop: Scientists usually prefer to publish their results using charts and plots to articulate the value of their algorithms and models in simple business terms.

- Scientists can now share and publish their artifacts to business SMEs (via Jupyter and Rshiny) and receive instant feedback to course correct on their iterative approaches to measure and improve their algorithms marching towards business KPIs and outcomes.

- Business users reviewing the charts and applications built by their scientists can provide instant feedback loop.

Publish artifacts using Jupyter Notebooks and R-shiny applications with SMEs & Review Results **Iteratively**

Delve in with ease to administer our platform:

IT Administrator dashboards and features have been enabled to ensure full visibility of the platform stack and its usage.

Scale on-prem or Auto-scale on IBM Cloud, AWS or Azure where your data lives

Security Engineering practice to resolve vulnerabilities instantly backed by IBM.
Support for LDAP groups and the addition of project templates to standardize project artifacts.
Add/Remove nodes on-demand or auto-scale in the cloud.
Scripts for custom JDBC data source connections, applying certs and keys etc.
User management, Services, logs, Alerts etc.
Ensuring artifacts in discovery can be deployed in production with management and deployment capabilities with versioning embedded and much more.

Try, buy and deploy easier than ever

With Watson Studio 2.0:

You no longer have to wait on procurement teams to acquire or allocate servers in your data center.
You can test 100% of your assets in the cloud before you decide to buy.
With our convergence efforts, what you see is what you get whether you are running in our cloud, in your data center or on AWS or Azure wherever your data lives, and, no matter what stage of testing you reach in our cloud, you can easily download your assets to your physical servers once they’re procured.
Cost, Quality and Agility in practice running in the cloud, will save teams countless hours or days, accelerating your data science life cycle by removing significant delays at the outset of a project — all within IBM Watson Studio 2.0.

Greg Filla will be covering the capabilities of Watson Machine Learning 2.0 and how the model management and deployment works integrated with IBM Watson Studio.

Read Greg’s blog: https://medium.com/ibm-watson/technology-convergence-culminates-in-ibm-watson-machine-learning-v2-0-cc3187ef297f

Julianna Delua will be covering the overarching benefits of the combined technologies and how they benefit a modern enterprise.

Read Julianna’s blog: https://medium.com/ibm-watson/introducing-ibm-watson-studio-and-ibm-watson-machine-learning-2-0-4fbdb9a3b4dc

Or, you can read the Enterprise Strategy Group (ESG) technical validation for even more detail. Read the ESG Technical Validation > https://www.ibm.com/account/reg/us-en/signup?formid=urx-36667

Accelerate your Data Science Life Cycle using IBM Watson Studio 2.0

Written by Vishnu Kannan