Infuse automation at scale with IBM Cloud Pak for Data 4.0

Published in

Cloud Pak for Data

5 min readJul 4, 2021

When’s the last time you considered if you’re operating in a truly predictive enterprise, furthermore, if it’s easy for your data consumers, models and apps to access the right data? More often than not the answer is a resounding “not very”. Between the proliferation of data types and sources and tightening regulations, data is often held captive, sitting in silos. Traditionally, strategies for overcoming this challenge relied on consolidating the physical data into a single location, structure and vendor. While this strategy seemed great in theory, anyone that has undertaken a migration of this magnitude can tell you it’s easier said than done.

Earlier this year at THINK we unveiled our plans for the next generation of IBM Cloud Pak for Data, our alternative to help customers connect the right people to the right data at the right time. Today, I’m excited to share more details on how the latest version of the platform, version 4.0, will bring that vision to life through an intelligent data fabric.

The journey so far

Since the launch of IBM Cloud Pak for Data in 2018, our goal has always been to help customers unlock the value of their data and infuse AI throughout their business. Understanding the needs of our clients, we doubled down on delivering a first-of-its-kind containerized platform that provided flexibility to deploy the unique mix of data and AI services a client needs, in the cloud environment of their choice.

Cloud Pak for Data Journey : 10 releases in past 3 years

IBM Cloud Pak for Data supports a vibrant ecosystem of proprietary, third party and open source services that we continue to expand on with each release. With version 4.0 we take our efforts to the next level. New capabilities and intelligent automation help business leaders and users tackle the overwhelming data complexity they face to more easily scale the value of their data.

Weaving the threads of an intelligent data fabric

A data fabric is an architectural pattern that dynamically orchestrates disparate data sources across a hybrid and multi-cloud landscape to provide business-ready data in support of analytics, AI and applications. The modular and customizable nature of IBM Cloud Pak for Data offers the ideal environment to build a data fabric from best-in-class solutions that is tailored to your unique needs. The tight integration of the microservices within the platform allow for further streamlining of the management and usage of distributed data by infusing intelligent automation. With version 4.0 we’re applying this automation in three key areas:

Data access and usability — AutoSQL is a universal query engine that automates how you access, update and unify data across any source or type (clouds, warehouses, lakes, etc.) without the need for data movement or replication. With AutoSQL you can query distributed data across disparate landscapes up to 8x faster than the standard data warehouse.
Data ingestion and cataloging — AutoCatalog automates the discovery and classification of data to streamline the creation of a real-time catalog of data assets and their relationships across disparate data landscapes.
Data privacy and security — AutoPrivacy uses AI to intelligently automate the identification, monitoring and enforcement of sensitive data across the organization to help minimize risk and ensure compliance.

Additional enhancements woven into 4.0

Further augmenting the intelligent automation of our data fabric capabilities is another new service coming to IBM Cloud Pak for Data, IBM Match 360 with Watson. Match 360 provides a machine learning-based, easy to use experience for self-service entity resolution. Non-developers can now match and link data from across their organization, helping to improve overall data quality.

IBM SPSS Modeler, IBM Decision Optimization and Hadoop Execution Engine services are also included as part of IBM Cloud Pak for Data 4.0. These capabilities complement the IBM Watson Studio services already within the base and enables users such as business analysts and citizen data scientists, to participate in building AI solutions.

AutoAI is enhanced to support relational data sources and generate exportable python code, enabling data scientists to review and update models generated through AutoAI. This is a significant differentiator compared to the AutoML capabilities of competitors, where the generated model is more of a black box.

AutoAI : Further enhanced with Cloud Pak for Data 4.0

Complementary capabilities are also released on IBM Cloud Pak for Data as a Service, including IBM DataStage and IBM Data Virtualization. Now available fully managed, DataStage helps enable the building of modern data integration pipelines, and the Data Virtualization capability helps to share data across the organization in near real-time, connecting governed data to your AI and ML tools.

Finally, IBM Cloud Pak for Data 4.0 includes several platform enhancements, most notable of which. is the addition of Red Hat OpenShift Operators. These help to automate the provisioning, scaling, patching and upgrades of IBM Cloud Pak for Data. First time installs are significantly simplified, decreasing the cost of implementation, while seamless upgrades reduce the upgrade process from weeks to hours. Also beginning in 4.0, IBM Cloud Pak for Data is built on a common IBM Cloud Pak platform, enabling standardized Identify and Access Management and seamless navigation across all of the IBM Cloud Paks.

Data is a huge competitive advantage for companies and when combined with AI, has the power to drive business transformation. IBM Cloud Pak for Data enables just that, but 10x faster with built-in automation.

Infuse automation at scale with IBM Cloud Pak for Data 4.0

The journey so far

Weaving the threads of an intelligent data fabric

Additional enhancements woven into 4.0

Written by Hemanth Manda