Data Mesh — Technical Implementation — API

Paul Cavacas
4 min readApr 18, 2023

--

This is the second part in the series following along with building out the technical foundation for a Data Mesh. See the first part which describes the series and the definition of Data Products here.

Part 1 — Data Mesh — A Technical Implementation — Data Product | by Paul Cavacas | Apr, 2023 | Medium

Part 3 — Data Mesh — Technical Implementation — API | by Paul Cavacas | Apr, 2023 | Medium

This part in the series will create the underpinnings for the API services that will control the whole Mesh. The layout of the services is the important part since it allows anyone else to plug and play different vendors or providers as they see fit. The API that is being developed is the really a central controller for all of the services.

Design

The API will be developed use FastAPI for Python, but again you can take the principals and create the API in whatever language you are most comfortable. The implementation part of the API can be broken down into 2 main types Planes and Platforms.

Planes

The main blocks of code that handle the work are divided into 3 Planes, which map to the concepts described by Zhamak Dehghani; Experience Plane, Data Product Plane, and Infrastructure Plane. Each of the API endpoints call into one of these planes, which orchestrate the work involved to complete the request. Typically, by calling further into the platforms, which is describes further on in this article.

Mesh Experience Plane

The experience plane is used to provide high level functionalities around the Mesh. Things like searching and retrieving domain lists and other general concepts typically used directly by the end users of the Mesh.

Data Product Plane

The data product plane contains a lot of “features” that we will be implementing in the Mesh and basically returns information or provides operations against individual Data Products.

Infrastructure Plane

The infrastructure plane provides all of the interactions that happen at lower levels, such as dealing with Blob Storage and dealing with CI/CD pipelines and other components along those lines.

Platforms

The concept of platforms here is how the Mesh is going to be built to work against any different service that is needed, including against multiple different services.

For example, we will create something called a Query Platform. This will provide a simpler generic interface that can be called, but the method will inspect the corresponding Data Product and connect to whatever the underlying Output Port is and query the data product, whether that is something like Snowflake or SQL Server, or even something more nontraditional like a Power BI endpoint. These platforms are the key to simplifying the interaction with multiple different types of outputs, while at the same time providing a standard interface.

We will be creating numerous platforms for each of the different features that we want to implement such as:

  1. Query Platform
  2. Usage Platform
  3. SLO Platform
  4. Data Quality Platform
  5. etc.

API Scheme

Experience APIs
These are top level Mesh functions that provide consumer facing functionality. Here is a sample of the APIs that will be provided at this level.

  • Search — Allows searching for data products in the Mesh.
  • Domains — Returns a list of each of the Domains registered in the Mesh.
  • Known Entities — This is something that we haven’t discussed yet, but will provide a basis for creating linkable entities. For example, Customer and Product will be Known Entities, which when Data Products use these known entities it will provide extra level a interoperability between them.

Data Product APIs
These are the set of APIs that will interact with individual Data Products. Here is a sample of the APIs that will be provided at this level.

  • Get — Returns the list of all Data Products registered in the Mesh.
  • Post — Register a Data Product onto the Mesh.
  • Get Data Product — Return detailed information about a single Data Produc.t
  • Test — Run various AI and Business Rule Tests defined in the Data Product.
  • Contract Check — Runs a check to ensure that the data being provided by the Data Product matches the defined input and outputs of the Data Product.
  • Query/Sample — Queries a Data Product to return the data for this Data Product. Sample will return a random amount of the data.
  • SLO Check — Run various checks against the Data Product to ensure that it is matching the defined SLOs.
  • Lineage — Runs lineage information between the Data Products.
  • Usage/Logs/Trace/Cost — A set of APIs that will return the corresponding information from the Data Product. Depending on the type of Data Product and what is backing it, will depend on the information that is available and returned.

Summary

The first articles in the series lay the groundwork for what we are building and an initial glimpse into how we are designing and building it. Please let me know if you find this interesting and what areas you want to go into next. As I continue to build out the platform I will continue to post more and more of the details about each of the parts.

--

--

Paul Cavacas

Technologist who has been involved in many aspects of development, from Full Stack Developer to Data & Analytics