Making sense of all this data

Leroy Kahn
Zip Technology
Published in
10 min readAug 8, 2023

Welcome to the fourth blog post from the Data & Analytics team at Zip.

The second post detailed the journey we took to modernise our data stack which enabled democratisation of data development, increased the speed of delivery and ultimately enabled us to scale. From observations and surveys conducted, we determined that the next biggest overall problem to really unlock value from our data stack for our team and data consumers was that it was difficult and slow to find, share, understand and have the confidence to use data seamlessly. To address this, as part of our data strategy to deliver fast and future-proof data, we set a goal of improving the efficiency of data consumption by 20% which had dependency on implementing metadata and data observability/quality/reliability management capabilities across people, process and technology.

In this post, Leroy Kahn, Zip’s Data Management Lead shares his perspectives on building information management capabilities from the ground up.

Tal Bergman — Director, Data and Analytics

After 20 years in various data & analytics roles in established, large corporations in the utility, financial services and FMCG industries, I was excited to join a successful, scale-up fintech with the benefits of an innovative, dynamic, fast-paced and purpose-driven work culture and so far, I have not been disappointed. With it being just over a year since I started at Zip as the Data Management Lead, it feels like a great time to step back, reflect and share some key learnings from our journey in levelling up Data Management.

Within the first two weeks of joining I was set the task of looking into the value we could derive from our incumbent data observability tool to make a decision on renewing a contract before the end of our financial year. Within the first 6 weeks we conducted market research, shortlisted vendors and decided on a proof of value for Atlan as our metadata management (business glossary and data catalogue) tool to maintain business and technical data assets.

Since then, the pace has not slowed. We have managed to implement and drive adoption of tooling as well as progress foundational aspects of data management, including uplifting data literacy across the organisation.

One of the key mindset shifts as part of this change was understanding the need to treat data, not only as an asset from which we should look to maximise our return on investment, but as a strategic one alongside our people, brand, product, finances and partnerships to achieve a competitive advantage.

Managing data as a strategic asset

In order to effectively manage data like other strategic assets, we needed to know:

  1. Why do we need this data from a business perspective, including consideration of legal obligations and commercial requirements?
  2. Who owns and controls the data and data assets and is therefore accountable?
  3. How critical and sensitive is the data so we can prioritise the efforts of our finite resources and ensure an adequate amount of investment to ensure its quality and security to meet risk appetite levels?
  4. How well are the technical data assets being developed and maintained to meet business needs and requirements?
  5. Where is it? Where did it come from? Where is it going? How is it changing?
  6. How much value or benefit are we deriving from it and how much do these cost to maintain?

Lessons learned from launching metadata management capability

It’s no surprise that you’ll need some technical capabilities to enable management of all this data and information (contextualised data) about the business and technical aspects of data. For our metadata management tool, we conducted a short and sharp market review including paper research and discussions with peers against key criteria.

As set out by Moss Pauly, we had developed an approach and assessment criteria for our data & analytics reference architecture. These were used, along with others detailed below, to establish our key criteria for this assessment:

  1. Cost at scale — after the glow of first period discounts fade and your platform and data landscape continues to grow, be sure to get a clear, formal idea of what this will look like, along with any further features, optional “add-ons”, increased support, hosting, etc costs.
  2. User Experience — when first time users say this is “easy and fun to use” you know your ongoing change management journey of adoption and engagement is going to be a lot easier.
  3. Collaboration — it still amazes me how some vendors in this space do not consider this a “must have” feature. Without collaboration between data owners, data consumers, data custodians and everyone in between you will simply not achieve the required levels of engagement and subsequent ROI for data assets or data products. (Slack is integral to Zip’s culture, communication and collaboration so tight integration with this was a “Must Have” / deal breaker feature.)
  4. Partner vs Vendor — we knew how integral this software was going to be in our management so we were really only interested in organisations that were vested in our success, even if it meant making some sacrifices on their part, in the short-term.
  5. Strong references from peer organisations that are culturally aligned — ideally formally and informally to clarify that you are trying to solve for the same problems with similar approaches and to find out what their key learnings were on their journey and experience with the product.
  6. Ability to influence (having a say) — As a small-to-mid sized enterprise (SME) we wanted to know that our voice would count amongst similar organisations and not be drowned out by, or ignored in favour, of large corporates or more traditional ones that would have different business problems, approaches and needs to ours.

As you can see from these, although a product’s technical capability is a critical success factor, it is the qualitative and people aspects that should really be given just as much, if not more weight, to software decisions, as it is people and culture that will enable you to overcome any hurdles, technical or otherwise.

We did consider open source options, but in general, we estimated that the total cost of ownership would actually be higher, and the product roadmap for some was not clear at the time which was considered a high risk.

We decided on Atlan as, even though a relative newcomer, it had already received very favourable reviews from the likes of Forrester and the modern data stack community. We were particularly keen on Atlan’s active metadata approach where it was touted as the glue between existing tools and did not disturb, but rather integrated into, existing user workflows, including tight integration with Slack.

Proof of Concept (POC)

We conducted a 2 week technical trial of Atlan to ensure that there were no surprises when integrating with our Modern Data Stack along with Slack & JIRA.

We were up and running within a day and were able to scan dbt, Snowflake and Tableau as well as integrate with Slack and JIRA. We were particularly impressed with its clean, modern UI and the UX within Snowflake, Tableau and Slack as well as being able to search for terms from any web page.

Proof of Value (POV)

After a successful POC, we decided on a proof of value to show the benefit, and in fact return on investment, of having this tool for a few use cases, namely:

  1. Business users finding if reports or metrics were already available before raising requests for new ones.
  2. Enabling adequate data security controls through the capture of sensitivity classifications at detailed data element level.
  3. Enabling fast and reliable impact and root cause analysis based on data lineage for planned changes and unplanned incidents, saving time and effort as well as reducing operational risk.

As part of the business case, we estimated that we would be able to reach an adoption level of 100 Monthly Active Users but more importantly, show (in $ terms) annual efficiency gains which would more than cover the costs of the tool.

Our key learning here was even though you may have some great ideas for use cases, there is nothing like road testing the solution. With a few early adopters, you can truly understand business problems or pain paints in day-to-day activities and where the real or perceived business value can be found.

Gamification

As part of our adoption strategy we:

  1. Had an initial launch in one of our all-ins starting with a fun video on how not being on the same page regarding business terms, but assuming you are, can have dire consequences. Shout out to our internal communications team for an amazing idea and execution.
  2. Directly followed this up by an Easter Egg hunt to find 5 Zip slang words or “Zipisms”.
  3. Reinforced this with weekly prizes over a 4-week period for top users (search and find) and top contributors (adding terms, owners, links, Slack threads) both within our Data & Analytics teams and outside in the rest of business and engineering & technology teams.
  4. Had daily reviews of Slack messages to identify queries that could have been directed to Atlan and responded with a fun meme to remind users of this.

Outcome

MAU = Monthly Active Users

We reached 100 Monthly Active Users in the second week after launch.

But this is only half of the story, as it’s really the types of user activity that provide an indication of the business value generated by using the tool.

We allocated a nominal dollar value to each activity type available in Atlan reporting (e.g. searching for a business definition via Slack, viewing lineage for sources of truth, root cause analysis and impact assessment) that would not only save time for data users but, in some cases, also save time for members of the data team and other data producers in not having to respond to ad-hoc queries and requests for information (metadata).

Over the 30 day evaluation period we had solid usage and high levels of engagement in terms of positive feedback, which led to endorsement of a one-year partnership with Atlan.

We have since seen adoption and user engagement drop-off to a certain extent. This is typical of any change, especially when implementing new tools. We are aiming to address this with:

  1. Identifying and nominating champions to help with embedment across our data community.
  2. Driving continued uplift of data product documentation through incentives and clear targets (KPI) for data owners.
    (We are especially looking forward to how Atlan’s new AI feature will relieve some of the documentation burden for data business owners and data asset technical owners as well as improve the self-service discovery experience for data consumers.)
  3. Identifying further use cases to unlock business value.
  4. Ongoing reinforcement of how applying data management practices and using Atlan is beneficial to all Zipsters.
  5. Providing mechanisms to guide data consumers as to how confident they should be in using data and data producers on where they need to focus efforts on improving data to meet business needs.

Ensuring data consumers know when data is reliable to consume

After a few months of using an incumbent data observability tool we felt that parts of the user experience and lack of ability to fine tune test configurations would impede us from successfully rolling this out across Zip.

From market research, formal and informal discussions with peers and other Modern Data Stack vendors, we landed on a shortlist of Monte Carlo and Soda as potential alternatives, with Soda’s Core offering as our open source option.

Proof of Concept

We decided on a technical trial for both Monte Carlo and Soda to gain first-hand experience of their offerings. They both had quick setups, strong feature sets and user experiences, but despite the generous discounts on offer and assurances that these would be available to us in the long term, we were still concerned about costs at scale and additional features in the future.

In the end, we conducted a POC with Elementary’s open source dbt package to gain first hand experience of its setup, feature set and potential costs. We have since made the decision to progress with Elementary with a broader group of users across the team.

Key takeaways

  1. Don’t feel pressured into signing contracts — there will always be financial and time pressures coming internally from within organisations to manage budgets and deadlines and also externally from vendors who can “only secure this amazing discount if signed by a certain date”.
  2. Don’t be afraid to fail fast, often — This is our third attempt at finding and deciding on a technical capability which may seem like a waste of time but, we have all had first-hand experience of how a pressured or imposed decision can lead to long term pain. If anything, our trials of Monte Carlo and Soda vindicated our thoughts on not sticking with the incumbent tool as well as strengthening the value case for what this capability will bring to the organisation.

What’s next?

As part of our data strategy for FY24 we are looking at:

  • Developing further, lightweight Information Management Framework components (policies, procedures, standards, processes, controls) just in time to support Zip’s strategic priorities which all have dependencies on data.
  • Clarifying data business ownership and data asset technical ownership to ensure owners are clear on their accountabilities which include driving documentation, understanding and appropriate use of data.
  • Uplifting our conceptual / enterprise data modelling capabilities and ensuring these form part of the design and implementation of data products to ensure common, business and technical understanding.
  • A coordinated, cross-functional effort focusing on the continued protection of our customers’ personal information (privacy) and efficient management of related data products.
  • Tighter integration between Atlan, Elementary and our development tools like dbt to enable “shifting left” on data security, quality and life cycle management.

I am genuinely excited about the challenges and opportunities that we’ll be facing over the next year as we continue to build out future-proof data. I am particularly looking forward to supporting key business outcomes and driving value for internal and external stakeholders with the maturing information management capabilities that we are putting in place.

--

--

Leroy Kahn
Zip Technology

Data and Information Specialist with over 20 years experience across multiple industries and organisation types