The Journey to Digital: Part 2, Data Transformation

Published in

IBM Data Science in Practice

6 min readMar 15, 2018

Companies have a choice today. They can be the disrupted or the disruptor. I laid out the case for this in Raiders of Every Industry: The Journey to Digital and Journey To Digital: Part 1, Table Stakes.

In those introductory posts, I note that becoming digital is typically a three-phase journey: 1-Data Transformation, 2-Data Science Transformation, and 3-Digital Transformation. In this post, I focus on the first phase: Data Transformation.

Data Transformation

Data Transformation might not feel particularly transformative, but it’s the foundation for a successful digital transformation journey:

This stage is about defining the core assets that create value for the enterprise, and it’s about identifying, discovering and governing the right data without necessarily expecting — or forcing — upheaval. At this stage, data consumers might ask for data to support their preconceived notions. That’s actually fine. More often than not, aligning with existing expectations is a necessary step as you build consensus for data science to transform the organization.

Again, for now the goal is not about upheaval. Companies should build their foundational information architecture before they worry about data access or data insights. Our frame for this is the AI Ladder:

Strategy is important here. Focus on understanding what data is available and how it can help existing stakeholders act more efficiently and with greater confidence, with an eye toward defining your company’s core data assets.

As I mentioned in Six Steps Up: Zero to Data Science, these core data assets typically revolve around three categories common to all organizations: Customer, Product, and Talent. You can lump everything else into what I call Company — and to be trendy we can tack a ‘360’ on the end of each category. The first step in building these assets is to map them conceptually and within your data infrastructure: Which specific data contributes to each of the categories? Where does that data live today?

How you proceed from there depends on the current state of your data, where your data resides today, your cloud strategy, and the talent you have available. People tend to think the answer is to dump all data into some Hadoop solution, but don’t be hasty. It’s important to consider re-architecting and moving the data into the data store that’s the most performant for each use case. That might be a relational store, a key value store, a columnar store, a graph store, or an object store.

I’m not a fan of fighting every battle but instead choosing the right battles to fight. One battle not worth fighting is the one over semantics. Each line of business might have its own terminologies, and trying to change them is typically a losing battle. Additionally, trying to standardize across the enterprise can impede the acceptance and adoption of your data assets. Instead, try defining the enterprise semantics and mapping the semantics from the individual lines of businesses to your enterprise definitions — essentially creating a Rosetta Stone for your enterprise. This approach has an added value: It means you can deliver your data assets back to the end user in a language they understand. That can reduce friction and speeds adoption.

Let’s explore the specific data assets:

Customer360 data probably lives in a combination of an ERP system, a CRM, and various business warehouses throughout your enterprise. It also lives on individual laptops and desktops throughout the organization. For many enterprises, the first step can be the hardest: getting a single view of your customer. It might seem hard to believe, but many companies don’t have this single view today. Many have 10 or more versions of every customer, each with different name spellings, different account numbers, etc. You’ll know your Customer360 is mature when you can determine how many customers you actually have, what they buy, how much they spend, how active they are, where they’re located, what type of business they are, etc.

Product360 probably lives in similarly acronym-ed places as your Customer360 data. And similar to customer data, many companies don’t have a singular view of their products. In many industries, as you develop products and create SKUs, getting a single view of products becomes a challenge — including knowing how much revenue and profit you generate from each product. When Product360 is mature, you’ll be able to align each product easily across the enterprise, determine revenue & profitability for each product, and connect the entire lifecycle of each product from research to end-of-life.

Talent360 is increasingly a critical data asset in these times of high demand for talent. Many companies have a good understanding of who their employees are — but a limited view of their value to the organization, perhaps because of organizational constructs or because the various systems that track employees are disconnected. That fracturing inhibits the ability to assess, retain, train and recruit quickly. Understanding your talent and being able to combine the information in this asset with other assets allows you to understand who needs your attention. It lets you identify top performers so you can show them special appreciation and provide them appropriate development opportunities. You also want to be able to identify your low performers so you can intervene early to get them back on track. When Talent360 is mature you’ll be able to identify all aspects of your talent including retention risks, compensation compression, diversity issues, and focus areas for recruitment.

Company360 is a catch-all for any data that doesn’t fit into one of the other buckets. Typically, the most important category here is finance data. Many companies struggle every quarter to close their books on time. Mature Company360 means senior executives have near real-time access to accurate financial performance data, as well as a view of regulatory and legal exposures, and various other items depending on your industry.

Other, optional data assets will depend on your company or industry, for example geospatial data (Location360) or connected device data (IoT360). Try to keep the number of 360 assets to a minimum, no more than 5 or 6. These assets become a rallying cry for your transformation, something tangible that the entire company needs to comprehend. Too many data assets is distracting.

Delivery of each asset takes two forms: a centralized set of RESTful APIs, and a data hub that leverages Kafka queues. Both should deploy the enterprise Rosetta Stone to provide the data back with the appropriate semantics for each line of business. You can serve these as separate APIs or as separate methods within a single API. While building out the assets, don’t discount the creative spreadsheets that aggregate highly transformed data from various sources. Instead, let them give you insight about what might be missing from your current data, and then focus on systematizing the spreadsheets as REST endpoints or Kafka queues.

Building out each data asset can center on specific use cases. In this context, a use case means that specific decisions are driving specific and tangible value. In turn, the value of each decision derives either from cost savings, cost avoidance, or net new value. Thinking in these terms has a few advantages. First, by aligning the work to a set of use cases you can shift the conversation from one about pure cost to one about return on investment tied to specific decisions. Second, you can focus on outcomes rather than the underlying technologies. Third, each use case can provide a path to understanding minimum thresholds for data governance, data quality, and completeness. And finally, the technology that instantiates the data should maximize the performance of each use case. How you define and support the use cases depends on many factors, including the current state of the data — but don’t assume you need to consolidate the data within a single environment. In fact, the better option is often to leave the data where it is and build from there.

In parallel to all this work, you’ll set up your data science team or give your existing data science team more secure, self-service access to the company’s data as well as a wider mandate to gather and explore that data. We’ll start there in Part 3: Data Science Transformation.

The Journey to Digital: Part 2, Data Transformation

Data Transformation

Written by Seth E Dobrin, PhD