A tailored Data Mesh for reMarkable
By Karan Kathuria (Head of AI Products) at BearingPoint and Martí Colominas (Head of Data & Insight) and Frøydis Høistad (Data Governance Lead) at reMarkable
Data Mesh is a buzzword in the world of data management. At first glance, the core idea is inarguably very intriguing. On paper, Data Mesh offers to solve well known choke points related to an organization’s growing demand for data and analytics. But does it answer everyone’s big data challenges? Probably not.
Few organizations are ready for a prompt transition into the purest version of the Data Mesh paradigm. However, we strongly believe that there are, depending upon the characteristics of your organization, many ideas within this paradigm that your organization could benefit from adopting. At reMarkable we’ve taken the core Data Mesh concept and developed a tailored hybrid version of Data Mesh.
Data Mesh in a nutshell
At its core, we regard Data Mesh as the idea of distributing the responsibility to develop and serve data as a product within an organization. Enabling this is far from easy, but by leveraging domain expertise this can be done in a much more agile manner than with a centralized responsibility. The key benefits of Data Mesh are arguably a scalable and flexible solution to the increasing demands for data and insight.
At reMarkable we acknowledge that this transition can be challenging, but we do not regard the idea of decentralization as binary. We have evaluated the pros and cons related to both fully centralized and fully decentralized orientations in search of the right balance between the two.
The tailored Data Mesh
A typical data and analytics platform consist of many parts. The stack and infrastructure, the data content, the insights, the people, and the governance surrounding it, to mention some. Each of these parts could be placed somewhere on the axis between centralized and decentralized, making the question of centralized/decentralized, multi-dimensional.
The process of placing these parts on the axis required us to access some organizational attributes. The following attributes were among those considered:
- Ambitions related to the scale of data and analytics, both in terms of efficiency and size of the team
- The current level of established common conventions and practices
- Organizational culture, especially the level of autonomy within different business domains
- The level of data literacy
- The extent to which input data sources could be exclusively mapped to one business domain
- The relative importance of domain expertise in materializing insight use cases
In reMarkable’s case, the key drivers for a decentralized setup were:
- A strong culture for autonomy among and within the business domains. Each business domain sets its own prioritizations and has historically solved its insight requirements as they’ve found fit, with limited company-level guidance with regards to technical conventions, practices, and templates
- High level of data literacy. reMarkable has a young and highly analytically oriented workforce. A large part of their workforce has a technical background and their ability to articulate insight requirements suggest a high level of data literacy
- Level of ambition and future scale of data and insight. A shift in the business model suggested that the complexity and scale of insight requirements would increase rapidly. Enabling a data- and insight platform that would support this future scaling without creating future choke points was a key argument
- Data domain and Business domain mapping. When trying to map out how each of the data domains (and related sources of data) within the organization would map to the business domain, we saw a surprisingly good match with few overlaps. This supported the idea of decentralizing data ownership making each domain responsible for its own data
On the contrary, the following key driver argued for a more traditional centralized setup:
- The number of siloed analytics initiatives, the lacking technical conventions, naming conventions, templates, data interoperability and established best practices on how to build data pipelines and where to place business logic were all key arguments for starting out with a more centralized setup
- A limited number of data engineers required us to be pragmatic in how we efficiently share a pool of data engineers leading us to centralize the company-wide prioritization of their efforts related to ingesting, refining, and managing data
In order to balance these two forces, we came up with a compromise where we fully centralized the platform and technical infrastructure, and partly decentralized parts such as data governance, data ingest, and the pool of data engineers. In this, we found it highly useful to distinguish between source-aligned data — mostly meaning raw data from its source and consume-aligned data — meaning data as the business consumes it.
We’ve also ensured most data analysts can work end to end, becoming more analytical engineers than pure data analysts. By doing so, we could keep the engineering-intensive work related to ingesting, cleaning, and ensuring data interoperability more centralized, and the domain-intense work related to creating consumed aligned data products more decentralized. The thinking is that this would support a high level of autonomy and scalability.
Overall, we ended up with a compromise placing us somewhere between a Data Mesh and a traditional centralized approach. We have done so with a clear intention of moving towards decentralization as we mature in terms of common conventions and practices and, increase our scale.
Leverage relevant ideas from the Data Mesh concept
BearingPoint supported reMarkable in establishing their target state for Data & Insight, and was also central in establishing the organization and realizing the platform. In our view, organizations can and should benefit from leveraging relevant ideas from the Data Mesh concept, without having to adapt it completely. Feel free to contact us for an informal chat on how this can improve your current state of data and analytics.