The Journey to Digital: Part 1, Table Stakes

Published in

IBM Data Science in Practice

4 min readFeb 15, 2018

In the initial post in this series, I gave a picture of the entire digital journey. With this post, let’s dive into part 1 of that journey: Table stakes.

Effectively transforming a company requires a commitment to do things differently, and requires that your partners and vendors do things differently too. To my mind, these days transforming a company means transforming it digitally and any successful digital transformation requires three things. I think of these as the minimum requirements for getting a seat at the table, the table stakes if you will:

Machine learning in Everything
Everything Lives Everywhere
Open source at the Core

Machine Learning in Everything

Making machine learning the future of business requires starting now to infuse it into your systems and platforms. That means infusing machine learning into the systems you use to automate mundane tasks and repetitive work. In other words, begin by deriving more productivity from scarce resources. In particular, look for ways to apply machine learning to unified governance, hybrid data management, and data science.

Data management requires a new approach, too:
— A SQL engine and Optimizer that learns and adapts to user workloads at runtime, using neural networks and other advanced open source ML algorithms, resulting in consistent or faster query execution over time
— Next-generation cognitive data warehousing appliance with Data Science and Machine Learning tools built in, for data exploration and analytics right where the data resides.

For Unified Governance, applying machine learning can create impact in several ways, but especially with metadata. Look for ways both to automate metadata discovery and especially to create metadata where none currently exists.

All the while, plan for machine learning model maintenance, a challenge that often sneaks up on enterprise data science programs. In particular, challenge yourself to apply machine learning to monitor performance of the models themselves and then automatically retrain models that fall out of synch. Doing so can dramatically reduce the mundane workload on your scarce and hard-to-retain data scientist talent.

Everything Everywhere

The future is not one cloud. It’s multicloud. Any system or platform needs to be deployable across all your environments, from your mainframe to your private cloud to your multiple public clouds. Your ability to develop, deploy and operate seamlessly across all these environments can easily mean the difference between winning and losing.

With the advent of cloud, managing data in a single data center (let alone a single instance of a database) naturally leads to a loss of differentiation for most companies. Instead, you’re like to face the need to manage data across multiple environments and even multiple parts of the world. To succeed, enterprises need the ability to have the same databases regardless of which cloud they’re using at the time. This is particularly true as the various cloud environment require their own unique skills and experiences. For companies to keep up with the growth and expansion of cloud, they’ll need a single environment across the various clouds.

Governance is crucial here as well. Historically, organizations have applied governance one database — or even one schema at a time if the data was localized. Now that data is everywhere, your governance needs to span environments and function in near real-time via a set of composable services that can act across multiple clouds and across business-critical and legacy assets. Governance can be more constant and lighter weight when the same governance construct and system spans environments.

While the future is multicloud, for the time being all data will not be everywhere anytime. And moving data away from its center of gravity can be costly and can even stop a project in its tracks. To anticipate this issue, data science platforms can train models on one data environment then deploy and manage them on any other environment. As part of the deployment, be sure to include the monitoring and retraining of the models.

Open Source at the Core

The value of open source is not its “free-ness”. The value is the innovation and flexibility it brings. Having open source at the core of your platforms enables portability and ensures that your teams and your enterprise have access to the most advanced technology available. Open source also allows enterprises to connect platforms without worrying that they’re infringing on another company’s intellectual property.

A good example of this interoperability between platforms revolves around metadata. Every data store has its own proprietary metadata system and each of them wants to be the master. Historically, organizations have resolved the issue with expensive third-party connectors. That paradigm is changing with the maturation of Apache Atlas and associated projects. Essentially, Apache Atlas allows you to create a metadata virtualization layer to connect the various proprietary metadata stores. Finally, the ability to truly create an enterprise data catalog has arrived — the direct result of a mature open-source project.

Demand what truly matters

Transformation requires automation, flexibility and portability. By demanding these things from vendors, enterprises can increase the chance that they themselves will be the disruptive force in their industry, and not the ones left behind.

The Journey to Digital: Part 1, Table Stakes

Machine Learning in Everything

Everything Everywhere

Open Source at the Core

Demand what truly matters

Written by Seth E Dobrin, PhD