Introducing Cloud Pak for Data 4.5

Sachin Prasad
Cloud Pak for Data
Published in
7 min readJul 3, 2022
CP4D 4.5 release celebrations at IBM Austin!

What a way to start the 4th of July long weekend .. We finally made it!! All the hard work, dedication that went into planning CP4D’s brand new version has paid off. I am more than excited to announce that Cloud Pak for Data’s latest version is out now and its fully loaded with goodies to awestruck you for months. With this blog, I would provide a quick tour of what to expect in this summer release of Cloud Pak for Data (CP4D).

Looking back, the first release of this product was roughly four years back (around June 2018) and I have had the honor of being around with this product during its initial days when the focus was to win the platform, to win a very niche but growing vertical of Data and AI platform. Back in the days, it was more about features, more about business use-cases, more about providing an end to end value to variety of personas. CP4D was the new kid on the block, IBM’s poster child & sweetheart of tens of thousands. Enterprises liked it, and started embracing it with open arms even though they knew it wasn’t quite ready for production yet.

Today, after four years, we are proud that CP4D has grown and matured in leaps and bounds with almost 40+ services providing a plethora of features. As our product has matured, our focus has transitioned from building new features to doubling down on what makes us unique in the market — unparalleled robustness and resiliency — making CP4D the best choice for enterprise production workloads.

Now, for the platform to play with the big boys, it has to flex its muscles, it has to prove itself yet again and this time it would be it’s core, it would be those little things that we overlooked or ignored but knew are important. It’s time we start focusing on non functional requirements — hardening the core, security, performance & resiliency so we can truly transition from an adorable kid to a responsible teen

Highlights of Cloud Pak for Data 4.5 —

Disruption free backup & disaster recovery

Disruption-free online backups and disaster recovery (DR) are highly sought-after features that will catapult us into a whole new realm, something that customers have desired for sometime now.

This feature allows CP4D administrators to take frequent backups that are online and disruption-free without sacrificing productivity. For some context, our current backup procedure requires the cluster to be put in quiescence mode which effectively means shutting down the cluster and disrupting business. Depending on the data, backups could take 4–6 hours. Given how mission critical CP4D is to our clients, this previous backup framework is too slow and disruptive for customers to take frequent backups. Thus, in case of a disaster or an recoverable failure, a customer may have to restore data from very old and stale backups. Disaster recovery also meant similar challenges and an effective enterprise-grade backup and DR were deeply missed.

CP4D v4.5 disruption-free online backups will rely on CSI snapshots which do not require any downtime, thus enabling organizations to efficiently protect their data. This technology serves as the backbone of our backup strategy and even though its great, its not a complete application fail-safe option and to bridge the gap, additional check-pointing backups ensures that the backups are resilient enough. The feature would support spectrum scale, ODF and Portworx based clusters.

The disaster recovery (available later) approach latches on to these snapshots and with help of spectrum protect plus, ensures that these snapshots are available on the recovery side for a DR failover if needed. The DR in this case is assumed to be active-passive, where the DR site is in standby mode waiting to be injected life if needed.

Install and upgrade improvements

Let me ask this — How many times have you attempted to install CP4D and you ended up spending days if not weeks going through a 20+ step procedure? With the advent of operators, we certainly have benefited with better security and control over our application but at the same time, technical skill demand for these install has increased by at least 10x. It’s good for our 20% of customers who are tech-savvy and like control and transparency but a nightmare for the rest of 80%.

To reduce the operators learning curve, an automation framework is being rolled out which would hide various complexities in not only installation and upgrades but promises to bring various other tools under one umbrella. This framework also promises to streamline airgap installs, mirror registries, registry cleanup, storage setups, and more.

Yet another major improvement that truly needs a mention is the fact that all our upgrades (from 3.5) are single hops now. Yes, you heard it right — single hop!

Previously, it was painstaking to upgrade CP4D since it required going through multiple hops which meant backups, validations, pre-upgrade preparations, and finally the actual upgrade for every single hop. Thanks to our SRE team for managing this mammoth task of validating hundreds of combinations of paths to ensure that when customers upgrade they do it without any hiccups.

Storage Support — Spectrum scale & AWS

IBM Cloud Pak for Data adds support for AWS (EFS & EBS) and IBM Spectrum Scale. AWS support validates our commitment to embrace cloud at every step of our effort to provide interoperability and cosumability with the hyperscalers.

Based on IBM General Parallel File System (GPFS), IBM Spectrum Scale delivers scalable capacity and performance to handle demanding data analytics, content and technical computing workloads. Some of our very large customers have relied on this storage for past 10–15 years and its quite imperative that CP4D should extent to these enterprise setups and provide retrun on investment

A full round up of available storage in 4.5 (not a exhaustive list, refer documentation for latest information)

Security

Security is top of mind for our customers. In addition to our monthly security-focused releases aimed at reducing our open source vulnerability debt, we introduced additional Identity Access Management capabilities including:

  • Separation of duties: New permissions for creating and managing projects and deployment spaces based on role.
  • IAM Service enables Cloud Pak for Data to use multiple identity providers for authentication.
  • Improved security in shared clusters: No need to grant cluster-wide authority to the Cloud Pak for Data operators and the IBM Cloud Pak foundational services operators.
  • Attribute-based access control: Rules based dynamic access controls which are evaluated based on user’s attributes flowing from IAM system.

Resource Optimization via Data science project Quotas

When several users or teams share one cloud pak for data instance, they share a finite amount of underlying resources such as CPUs and Memory. This limited capacity but with unrestricted and unmonitored data-science workloads can impact other users or even essential platform services and cluster stability.

Watson Studio projects provide a logical separation between workloads based on dept, group of users or type of a Data Science initiative. CPD 4.5 introduces the concept of resource limits and requests (Upper limit to cap, Lower limit to reserve) to ensure projects operate as expected in a rather limited capacity environment. This feature would requires CPD Scheduler quota enforcement

New and enhanced services

Many service-level enhancements were introduced into Cloud Pak for Data 4.5, ensuring better interoperability and integration across our machine learning execution engines, decision optimization, analytics portfolio, hybrid data management, master data management, zSystem support, and open source components. See the What’s New documentation for a full list of service enhancements and new services. We also made significant innovations and improvements in data fabric, business analytics, and data management missions.

My congratulations & best wishes to our product managers, engineering & SRE teams who worked relentlessly to plan, build, and test these features ensuring a top notch, timely delivery with excellent quality control. I am super proud to oversee one of the best releases of CP4D and hoping that our customers would appreciate this thoughtful content well put together for their business.

Know more —

About Cloud Pak for data
Whats New in Cloud Pak for Data

Follow for more!!

--

--

Sachin Prasad
Cloud Pak for Data

Sachin’s day job includes helping customers build smart apps infused with AI to solve complex problems in a more sustainable way.