Future-proofing your data
Imagine this! You’ve built a single data platform that powers your entire business. All workloads from your legacy data platform have been migrated and the old platform has been turned off. You breathe a sigh of relief knowing that 100% of the team’s focus is dedicated to the new world, the new platform. It is really fast to execute use cases today across analytics and AI, but how do you ensure the data platform remains fast for the next 12 months, 2 years, 3 years and beyond. How do you ensure all your efforts are not wasted and that complacency doesn’t set in?
The data strategy I set two years ago at Zip was to build fast and future-proof data. The journey to data enlightenment wasn’t straightforward. Zip’s original data platform built during the start up phase was slow, cumbersome, and riddled with duplications. It was exhausting and painful to work with.
The transformation required building a modern data stack from the ground up, migrating hundreds of pipelines and tables, and ultimately switching off the old platform, something many companies struggle to achieve. The results were nothing short of miraculous, with data operations becoming fast, easy, and efficient. The team continues to blow me away with how quickly they can move on analytics use cases. Welcome to data nirvana!
Looking towards the future, Zip is not resting on its laurels. The focus has shifted towards future-proofing our data ecosystem to ensure speed, efficiency, and governance remain in harmony. We spent a lot of time discussing as a team:
- How to keep things as fast as they are today for years to come?
- If we were to have a completely new data and analytics team every three months, could they be operational within a week?
A number of key themes emerged that we felt were critical to future-proof our data. These were that:
- Users need the ability to find, understand and trust data;
- Return on investment from data needs to be known; and
- Data changes must be delivered in a way where speed and data governance are in harmony.
There are many different ways to define data products, from a collection of data sets to fully defined and mature products. At Zip we define a data product as…
A product who’s primary goal is to drive business value using data
We define them in this way as it anchors individual products directly to value and enables us to provide a framework for delivering that value successfully. We think about the data product as a collection of data and services we use to organise it into something that create values for internal stakeholders or directly for our customers or merchants.
For example, we personalise the shop feed in the Zip app so each customer has a unique set of merchant recommendations. The data and services we use to transform and store it, and the AI services we use to run the recommendation engine all combine to form the data product that we connect into our shopping feed.
Introducing the data product that rules all other data products
Coming back to the themes that we felt are critical to measure how we are progressing with future-proofing our data. We needed a way to determine whether any change that was made to the ecosystem of data and data products, by one of many data and analytics team members, improved the health of the ecosystem or degraded the health of the ecosystem based on a set of key information guided by the abovementioned themes.
I have to give credit to Moss Pauly, our Senior Manager — Data Products who came up with a elegant and simple solution to:
- Capture information from every change to the ecosystem
- Store the output and enhance with data product information through automation and a UI for manual entries
- Use the data product that rules over all other data products to drive behaviours across the data and analytics community to uplift health of the ecosystem and to future-proof our data.
Credit also goes to Leroy Kahn, our Data Management Lead who brought a wealth of knowledge into the process on things that were important to measure.
As Zip continues to innovate, our journey from data chaos to nirvana is just the beginning. It is critical that we remain as fast as we are today for years to come. We have plans to leverage generative AI to speed up our documentation process on all data changes, ensure we have data quality tests across all critical pipelines and support authors and approvers of code with an AI oracle to lift the bar on coding standards and practices so that we continue to future-proof data at Zip.
To close I want to share several key learnings that may seem straightforward however were more significant than expected now that these learnings are in the rear view mirror.
- You can’t move forward if you are weighed down by the platform of the past. Migrate and decommission the legacy. Get it done!
- Optimisation has equal standing with building new things. As you build new take the time to optimise. The longer term benefits are there. We amaze ourselves on what we can do but it is only possible through the optimisation work we delivered on the way.
- Set the culture and standards early on. Overcommunicate and spend the time to take team members on the journey. Course correct behaviours often with clear explanation of why future-proofing data is just as important, if not more, than building things quickly.