Rise of the Medallion Mesh

Franco Patano
7 min readJun 19, 2024

--

The data forward path to the future of data and AI

The opinions expressed in this blog are solely mine, crafted with a blend of wit and wisdom that are uniquely my own. Any complaints, disagreements, or raised eyebrows should be directed at me. My employer, spouse, and children bear no responsibility for my musings, ramblings, or rare bouts of brilliance. They have their own lives to live, and dramas to eat popcorn to.

Why?

The complexity in data and AI management systems has grown over the years. The vast ecosystem of tools that have proliferated, with their own vernacular, capabilities, and integrations have produced interesting architectural patterns. The combination of pre-existing conditions, psychological tendencies, and biases have led organizations down weird and strange paths. Compounded by the fact that most organizations’ IT spend goes unchecked, and cloud costs run rampant. This often leads to inefficient blind cost-cutting and unfunded cloud budgets to fix things that further exacerbate the issue.

What is the Medallion Mesh?

Medallion Mesh is not finalized, nor is it strict. There is no one-size-fits-all miracle pill. Its a flexible approach to modernize data management systems for the AI era by synthesizing what has worked with approaches that have come before, while adapting approaches where it makes sense. It draws influence from Medallion Architecture, Data Mesh, DevOps, MLOps, FinOps, data modeling, machine learning and AI model preprocessing. It can be thought of as a “grand unified theory” for modern data and AI systems. The following are the core concepts of the Medallion Mesh.

Open

Medallion Mesh is open. It leverages open source where it makes sense. Open table formats. Open to new thinking. Open to change if we need to. The architecture should consider open as a way to decrease risk of lock-in. But you should always evaluate value to risk when adopting vendor specific features. Its also open to interpretation and customization, given you measure your trade-offs wisely.

Transparent

Medallion Mesh is transparent. The goal should be to expose all documentation and lineage for every transformation. Documented and available for consideration in a unified system. Transparency is key to ensuring quality data. Transparency is not limited to data, it’s also for costs, value, and results. Teams should know how much they are spending that costs the business, and how valuable what they are working on is to the business.

Accessible

Medallion Mesh is accessible to all users. Users come in all different shapes, colors, and thinking styles. Whether its a data scientist, data engineer, machine learning engineer, architect, orchestrator, data analyst, visualization expert, or operations, all users should be provided interfaces that are suitable for them. A core part of this is providing natural language interfaces, to ensure we don’t torture our users with extensive training. Just to use the software.

Flexible

Medallion Mesh is hybrid multi-cloud. All claims to the death of on-premises data centers should be taken with a grain of salt. While the future is not on-premises, a lot of the data remains locked up there. Considerations should be made when evaluating migration. We should approach migrations with a value led approach, where we evaluate utilization and results of the datasets and create a prioritized plan to incrementally move into the cloud.

Unified

Medallion Mesh is unified and breaks down silos. Data that is locked away in silos only serves one purpose. In order to maximize the Return on Data Asset when data goes viral, we need to iterate towards unification. This is not a decree to unify all at once without thought. Rather, iterate towards unification via prioritization of utilized assets and return on value.

Sensible

Medallion Mesh has the grace to be sensible. Tradeoffs will always present themselves as a problem. Sensible means should always be used when faced with the difficulty of choice. This extends to data models. Dimensional models are great for simplicity. Assets for Machine Learning and AI like Feature Tables for classic ML, Vector Stores for RAG, and streaming datasets for LLM fine tuning are typically One Big Table. When the complexity of large multinational conglomerates that have the variety of data sources shapes and sizes comes in, data vault has the complex adaptive system to bring order to the chaos.

Iterative

Medallion Mesh iterates towards the perfect platform. Some say it doesn’t exist. They may be right. Just like the cloud is constantly keeping up with eventual consistency, we iterate towards that perfect platform. In business and engineering, we often struggle with the ‘definition of done’. Often because this is open to interpretation, also because people need to feel like they accomplished something between long bouts in front of a screen tapping away at the keyboard. Continuous improvement by making smart measured investments in the data platform using data, blazes the path to perfection.

We need to come to terms that the state of the platform architecture is never done. We can make iterative steps to modernize, in a practical way by evaluating the risk, cost, value, utilization, and results. First we need to do assets and inventory. What are all the systems that are used today to deliver data, analytics, Business Intelligence, and AI. Catalog all the systems into a unified catalog where we can track pipelines through the AI and BI systems to consumption. When data and AI solutions catch fire and go viral, we need scalable systems that can handle the stress gracefully. The last thing we need is the server to catch fire or costs to skyrocket right when peak viral strikes.

Risk

Risk presents itself in many forms. Data quality risks with ingesting and processing data. Transformation logic and business rules translation from business partners to requirements to computer code present multiple hop opportunities for miscommunication. Security risks abound, we can’t move to passwordless fast enough. Failures, upgrades, and downtime — oh my. Tracking, understanding, and evaluating risks is paramount to your ability to execute well.

Cost

It’s not a race to the bottom. Also not as naive as choosing the one in the top, middle, or bottom. Costs should be analyzed relative to the other dimensions. Sometimes money can solve your problem and be low risk and high reward. It should be obvious to avoid high cost, high risk, low value situations. The total cost, not just setup and expected operational costs, but how costs scale with utilization and integration. Let’s not forget the cost of training and support, not just in dollars but in lost productivity as well.

Value

What is the value of frictionless synergy? Value can be hard to pin down, but we know it when we see it. Things just seem to work the way we want them to. Data is accurate, consistent, and reliable. Processes are streamlined, efficient, and automated. Innovation drives strategic value through faster iteration and scaling successful outcomes. Everything is well governed, understood, compliant, and has integrity. The goal is to measure and maximize value relative to the other dimensions.

Utilization

We aren’t all blessed with budgets that allow us to build it and they will come. Measuring not only the utilization of the resources we are consuming and how well saturated those consumed resources are, but the user consumption and satisfaction of the system itself. Combined with the lineage aspect, end to end utilization of the system should be transparent to ensure any cost black holes or insight bottlenecks are properly addressed.

Results

It could be as simple as top line growth, or bottom line optimization, and customer/user satisfaction. Yet when our precious data workers and business partners are freed up from the shackles of interpreting the system and maintaining all the patch work and spreadsheet solutions the rewards seem to have a compound effect. Through wise investment in innovative AI we can scale successful outcomes to maximize growth and sharpen competitive advantage.

Data Forward

Similar to how your data and AI should not be in silos, analyzing these dimensions should be considered together. When evaluating options, it can be best to document the analysis, and present summarization to key stakeholders. When it’s clear and transparent how we considered the options for these key dimensions, everyone gains confidence in the road ahead. “In God we trust, all others bring data” — W. Edwards Deming. Use AI to blaze the path, data forward.

Coming Next

This blog outlines the framework for a data forward way to modernize in the era of AI. The following blog will layout the high-level concepts of how you would implement Medallion Mesh on Databricks. It does allow integration with other fabrics and data clouds, but you should always leverage the framework of risk, cost, value, utilization, and results to evaluate your options, document them, and use them as a compass to guide your way.

Here is a link to the talk I gave at Data and AI Summit: https://youtu.be/K7OFKdjwPxE

--

--

Franco Patano

I spend my time learning, practicing, and having fun with data in the cloud.