A Transformation Journey for Enabling the Data-driven Enterprise

Fahad Najeeb
Data @ Latitude
Published in
7 min readMar 17, 2020
image source: https://www.analyticsinsight.net/

Where does Data transformation sit in any context of Transformation today? Why Data gets left out during a Digital Transformation? What to do if your organisation has missed previous waves to modernise Data Ecosystem, governance and it’s management? Read on.

This article will cover more so on Data Strategy, it’s transformation and challenges than Digital Transformation alone.

Today’s huge volumes of data & legacy infrastructure do not match. Timely & accurate business decisions happen with trusted data and robust architecture that is built to scale.

Why to have a good Data Strategy?

Data Strategy will articulate the value you will generate by harnessing your historical and present data.

  • It will uplift your data architecture, help you with modernizing your operational and analytical processes.
  • If you go with just the Digital Transformation in the absence of Data Strategy then value generation out of data will be hampered with technological challenges. New apps will bring in huge volume of data whereas your traditional data warehouse will struggle with scale of the data. Coping up with unstructured/semi structured data and doing analytics on it will be a challenge & you will keep on piling bad tech debt because you uplifted your upstream systems but downstream capabilities remained same.
  • It will help define ROI based use cases to drive the value generation of data.
  • It will uplift your data architecture and help refine old processes.
  • Usually traditional warehouse(s) handle use-cases of human interaction with Data (Analytical) and machine interaction with Data (Operational) where as both these types of use cases don’t belong in a warehouse together, only former does.
  • Timely and Better decisions while harnessing the data at speed & volume.
  • Modernise your data infrastructure (Cloud) and save on the Opex.
  • Shift from Vendor dependent asset management to in-house capabilities, pivotal for a transformation, save development time.

If you are an organisation with legacy systems, warehouse(s) & operational processes that are aligned with your non-digital organisation side, and you have a workforce that excels using those systems and processes then there is a high chance that you missed on modernisation of your landscape of your core systems and data applications, or perhaps you are on the journey to do so.

On contrary; however, digital companies, neobanks & startups are a complete different story, they don’t have the legacy to solve for, they are nimble, agile and able to move fast and leverage modern engineering practices and architecture.

Two-pronged Strategy

Creating or transforming into a data-driven enterprise requires a two-pronged approach focusing on:

1. Culture Shift and Building a Data-first narrative

Get an executive buy-in who will champion the cause. 80% of the journey is your culture & the narrative you build.

Existing governance mechanism and structure can potentially be leveraged however it would need to reform, perhaps not from the get-go.

Articulate your return on data with new transformed data assets, this would also require creation and adoption of a reference architecture.

Another challenge I have seen is getting a data strategy (on paper) delivered by a consulting company and then failing to act on it. I would keep reiterating that building in house expertise outweighs benefits, so get your champions in i.e. functions of CDO, Head of’s, Architecture, it’s ok if you can’t get everyone together at the same time but start somewhere.

With your own champions, you can chose to execute your already delivered strategy, Or even better do this journey in an iterative manner with close stakeholder interaction.

2. Engineering & Architecture

Establish a team with engineering skills (Data, Cloud, DevOps) if not already present.

Solve for 4 Vs of Data, decide on your data orchestration framework, CI/CD, Cloud, Security Controls.

Be cloud First, realtime as opposed to batched, everything automated (CI/CD), MLA (Monitoring, Logging, Alerts).

Think about data classification, data security, Who/Why for access on data, patterns for data movement between data center & cloud.

If you are in data center and cloud at the same time then choosing patterns for data movement between both landscapes is interesting and comes with own challenges, some mentioned below:

  • Data center operations are usually outsourced, meaning new build times are time consuming.
  • Above point means, any opportunity of a new use case to build data infrastructure in cloud could be jeopardized.
  • Try to get in a generic pattern which could be leveraged for ingress, egress data movement.

Execution of Data Strategy

Business Aspect of Execution

Two approaches you can take: Use case based approach Vs Big Bang Approach.

There are challenges in both approaches and benefits as well.

Factors to consider:

1. How many legacy systems need to be migrated, size the problem. It can take years to bring in everything in new data landscape from source system of records and make it available for consumption.

2. How much of operational systems* are built in ETLs and/or in your warehouse, usually bad practice to build business logic for operational use cases in ETL.

3. Do you already have one foot in Cloud and the other in Data Center. This can present it’s challenges of marrying both world’s data for business users and will continue to be a pain till transformation finishes.

4. Any past failures from previous big-bang approach? findings?

5. Doing it yourself or through a vendor? (best do it in iterations)

**Operational System = a system that automate a manual job, usually have two automated systems at both ends and communication pipe/channel in the middle. e.g sending a system extract to 3rd party, calculating credit score and sending to another system.

Use case Based Approach

Pros

  • You can leverage a usecase that might be of utmost importance to your organisation, it could be from a well funded program or a new product launch. Get on board and articulate the ROI of data, there will be many functions who’d be after that data e.g. (list might be different for your industry) Marketing for their campaigns, Risk, Finance, Regulatory, Analytics etc.
  • Faster goto market.
  • You learn while delivering your first usecase and iterate on the learning for subsequent usecases.
  • You get to solidify your new data platform capability.
  • You have margin to experiment and pivot, recorrect as required.
  • You can have interim patterns in place to quickly refer to your historical legacy data sitting in old warehouse. Key is to balance, i.e. does your pattern allow flow of old warehouse data to new data platform or vice versa, both approaches again will have different challenges*

*Challenges=if you make new landscape data available in old warehouse, then there would be less motivation to move out of old warehouse however it could be the quickest option to give to business. If you make your old warehouse data available in new landscape as a quick once off and ongoing replication, it could alleviate your pain points for availability of data in new world.

Cons

  • You will not have your legacy and historical data available on day 1 in your new data platform. Depending on the size of your historical data & organisation this will be a huge task, even worthy of an own program in itself.
  • Users will need access to the historical data, so productionizing a new usecase alone will in most cases not be of significant value. Business need to marry historical/legacy data* with new world data*.

*historical/legacy data = data that lives in your on prem data center in a warehouse and is still in use by your current source systems of records, new world data = your new data landscape, this could be your datalake, cloud warehouse.

Big Bang Approach

Pros

  • At the end of your program you will get all your old warehouse data in new data platform.

Cons

  • Lot of upfront time spent and no value to be shown.
  • You will ingest huge amount of data, however subsequent layers of transforming, curating, serving will take time and delays.
  • If your old ware house is complex and big, unpicking it will take years.
  • Ingestion, Curation, Modelling to be done one after the other and matured before a single source could be brought in to new data platform.

Technical Aspect of Execution

Follow the DAMA principles (listed below for reference) and map them back to your execution pieces (going into each of them with detail is not in scope of this article, might cover in seperate articles). I don’t believe all 10 need to be upfront perfect.

In my view, priorities for an organization will dictate which of the following areas get more attention than others to start off.

1. Data Governance — try leveraging existing stewards, data owners, data analysts, governance framework, once you have users using your new data landscape, you would need to focus on this point more and solidify governance. Ideally move towards code-based governance.

2. Data Architecture Management — Make a reference architecture, start moving towards that in small steps/iterations.

3. Data Development — Focus on your data engineering, how it moves data, can team scale, this should be a strong function.

4. Database Operations Management — Once you deliver the first use-case, this area will start maturing with operations and support of new landscape.

5. Data Security Management — Keep this alive from start and how you want to handle PII, PCI information. How you want to classify data and control access to it. Again, automate the provisioning of users with audit trail.

6. Reference & Master Data Management — Your golden customer record? Ideal is to have your data murkiness and redundancy issues sorted in a separate solution and ingest clean data in new data lake.

7. Data Warehousing & BI Management — Build a data lake, a ware-house on top of that if needed and your BI Layer on top of either two.

8. Document & Content Management — (as per need)

9. Meta-data Management — Invest time, effort, money for a good data catalogue. Each organisation will have different needs around it so go for an offering that best suits you.

10. Data Quality Management — Build code bases quality metrics, important for establishing trust in your data.

Ending Remarks

  • Transforming just upstream system is just half the puzzle, do end to end transformation.
  • Derive value from data, work with stakeholders/business.
  • Engineering and Architecture to support Data Strategy Execution in iterations.
  • 2 Approaches of Data Strategy Execution.

--

--

Fahad Najeeb
Data @ Latitude

Discovering everything software and sharing my learnings.