Semantics and Data Product Enablement — A Practitioner’s Secret
Bare-bone Purpose & Key Components of the Semantic Layer, The Data Product Divergence and Impact, and Driving Business Value at Scale
Author: Frances O’ Rafferty
This piece is a community contribution from Frances, a recognised leader in the Data, Analytics, and AI space. With over 16 years of experience, Frances has delivered impactful data solutions across various sectors, with a focus on data management in financial services. She has also contributed to multiple projects in data transformations, cloud migrations, and data products, using a range of technologies and tools. We highly appreciate her contribution and readiness to share her knowledge with MD101.
We actively collaborate with data experts to bring the best resources to a 6000+ strong community of data practitioners. If you have something to say on Modern Data practices & innovations, feel free to reach out! Note: All submissions are vetted for quality & relevance. We keep it information-first and do not support any promotions, paid or otherwise!
The semantic layer plays a crucial role in enabling the development of data products that can be discovered, understood, and trusted, without which their value is unlikely to be realised. The semantic layer sits between the physical data layer and data consumption and translates the physical data into the language of your business.
My journey in the world of data began working with Business Objects BI tools. At the time, self-service reporting was a major focus. Data resided in the Physical Layer (Data Warehouse), while the Semantic Layer (Business Objects Universe) played a pivotal role in defining dimensions, measures, and data relationships, enabling self-service reporting in the Consumption Layer (Business Objects BI tools).
Fast-forward almost 15 years, and the landscape has evolved significantly. Data now originates from a mix of cloud and on-premises platforms and is consumed through various means, such as BI tools, Data Products, and AI Chatbots.
This evolution has brought about new challenges and opportunities in leveraging the semantic layer to drive consistency across the enterprise when talking about data and delivering business knowledge to AI solutions.
What is a Semantic Layer?
The semantic layer is a virtual layer that sits between the Physical Layer (data sources) and the consumption tooling. It provides a common business representation of the data, abstracting the complexities of the underlying data sources.
The Semantic Layer translates the physical data into the launguage of your business.
Back in the world of Business Objects, the semantic layer served as a foundational component where JOINs between tables were created, measures and dimensions were defined and organised into subject area folders.
This allowed users to seamlessly create reports by dragging and dropping measures and dimensions from a list of logically grouped attributes. The SQL was then generated and executed on demand to produce the report.
However, the downside is that all this knowledge is locked into the Business Objects stack and not shared outside the vendor’s tool.
This is still a problem for many tools today where we define the semantics within the consumption mechanism. By doing this, we are losing the opportunity to drive consistency in the way we describe data across the enterprise.
When we talk about the semantic layer today, we are considering Metadata, Business Glossary, Data Models (Logical, Ontology & Knowledge graphs), Access Control and Taxonomies.
The semantic layer is pivotal in bridging the divide between intricate data sources and user-friendly consumption tools. The semantic layer’s evolution should be a dynamic two-way conversation that adapts to changes in both the physical data and the consumption tools.
A robust semantic layer relies on sound data management practices to ensure that it comprehensively reflects the industry’s ongoing evolution rather than remaining static. Maintaining relevance is essential for effectiveness in an ever-changing technological landscape.
📝 Editor’s Note & Related Reads
Building a robust semantic layer implies an evolution-driven two-way street b/w the semantic layer and the product layer. Each enable each other in an evolutionary setup and are more parallel in nature than consecutive layers.
Semantic Layer helps data products access centralised and reusable context while data products feed back into the semantic layer (metadata model) with enriched use-case specific context. Some related reads on the same:
➡️ How to avoid Semantic Mistrust
➡️ The Reverse Path: How Data Products also fortify semantics/context
➡️ Adopting Isolated Semantic Tooling versus Adopting an Interoperable Semantics Layer integrated with Data Products and Existing Data Stack
Components of a Semantic Layer
The semantic layer, akin to other layers within the data ecosystem, comprises diverse components. Its overarching purpose is to enrich data with meaning, foster trust, and facilitate understanding (understanding of the same elements can be diverse, depending on the user).
The semantic layer empowers diverse data users to interpret and present data in a manner that resonates with their unique perspectives whilst simultaneously facilitating cross-domain comprehension to foster enterprise-level understanding.
This section will give a high-level overview of the components of the Semantic Layer: Taxonomy, Metadata, Data Models, Business Glossary and Access Controls.
Taxonomy: Classifying Data for Consistency and Clarity
Within most organisations, taxonomies play a crucial role in classifying data into hierarchical structures, encompassing categories and subcategories. For instance, a company in the business of selling juice may categorise its products into fruit juice, vegetable juice, and smoothies, with further subcategories such as apple, tomato, or green smoothies.
A clear taxonomy drives consistency in data categorisation, ensuring that comparisons are made on a consistent basis, such as comparing apples to apples rather than apples to pears.
📝 Editor’s Note
If you’re curious, here’s an excerpt from a research paper highlighting the distinction between taxonomy and ontology”Ontology can also be compared to taxonomy. While the former includes cardinality and restrictions, the latter is limited to “is a” kind of relationship. In other words, it organizes controlled vocabulary terms into a tree-like structure, being the controlled vocabulary the list of authorized keywords used to describe individuals of a taxonomy or ontology.”
Metadata: Unveiling the Data’s Story
In parallel, many companies are actively capturing metadata, which serves as the data about their data. This encompasses definitions, data sources, lineage, relationships to other data, data quality metrics, and versioning.
When integrated with the consumption layer, metadata empowers users to understand and trust their data, akin to the nutritional information and ingredients listed on the packaging of a juice product.
Global Metadata Model
The system should typically collect and interweave metadata contextually from the integration plane, lineage and historical logs, user data, application logs, and more.
The Metadata Model is key to a great data discovery experience. Metadata is a Big Data Problem, ergo, the Metadata experience depends on:
1️⃣ How the overwhelming data is modelled
2️⃣ How well is the big data solution designed
Metadata is a faction of every data product. Every Data Product:
- Establishes ownership of metadata real estate (in the metadata model) through semantic tags
- Generates new metadata as a by-product of regular operations across Design, Develop, Deploy, and Evolve stages of the Product Cycle.
📝 Editor’s Note: An Implementation angle you could explore.
A robust end-to-end metadata model is feasible through:- Central control plane of a data platform that has visibility across all touchpoints in the data ecosystem
- Distributed data product planes that frequently update the metadata model with globally comprehensible semantics
Reference: datadeveloperplatform.org — Control Plane
Business Glossary: Establishing Shared Business Meaning
The prevalence of Business Glossaries is on the rise, serving as a tool to establish shared business meaning across data and metadata. Comprising terms and definitions, these glossaries play a crucial role in ensuring that terms such as “active customer” are clearly defined and understood across different departments within an organisation.
An orders department may see an active customer as any customer who has paid for an order but the order has not yet been dispatched, the customer care team may see an active customer as someone who has placed an order in the last three months.
An example in the metadata we defined apple juice as having a serving size of 10fl oz, in the Business Glossary we define the term serving size as having a definition of ‘Recommended amount to consume in a single sitting’.
Irrespective of disparate domains and distributed teams, all logic, definitions, ontologies, and taxonomies defined across multiple layers and verticals plug into a common business glossary that is embedded with the ability to accommodate synonymous jargon and diverging meanings.
So now, when a marketing analyst refers to, say, the sales team’s data, they don’t have to struggle with understanding new jargon, tallying columns, or iterating with the sales associates/analysts to understand what the data means.
A simple example of another two-way enrichment street where the semantic layer benefits from the consumption layer: Say, the marketing analysts infer from sales that what they refer to as “annual_contract_value”, the marketing team identifies as “MQL_value”. They need only add a synonym tag, and for every other marketing associate who searches for the term “MQL value”, would be able to skip the jargon-obstruction.
Data Models: Unveiling the Structure & Embedding Knowledge
Most companies have a logical data model which provides a detailed representation and understanding of how the data is structured without the specific physical implementation details. A logical model sets the foundations for a semantic layer with attributes, entities and relationships.
In my Business Objects days it was common to also have a dimensional model which would define measures in fact tables and dimensions, descriptive attributes for grouping the measure, in dimension tables.
This model type is more performant and intuitive than a relational model for running analytics. More recently, ontology and knowledge graph models have been discussed as enablers in the semantic layer to embed knowledge in addition to understanding. An ontology captures the meaning of relationships between data entities via concepts and attributes.
📝 Related Reads
How to Build Data Products — Design: Part 1/4
Metrics-Focused Data Strategy with Model-First Data Products | Issue #48
Access Controls: Ensuring Consistency and Compliance
Finally, the establishment of rules for access and controls that remain consistent across all consumption patterns of the data. Managing within the semantic layer enables organisations to ensure consistent application of data security, privacy, and compliance measures across all data consumption tools and platforms.
This unified approach streamlines access management, mitigates security protocol inconsistencies, enhances transparency, and enables effective tracking and auditing of data usage. Ultimately, this cohesive access and control framework fosters a secure, compliant data environment and promotes seamless interoperability across diverse data consumption channels.
In essence, the components of the semantic layer collectively contribute to a cohesive and adaptable framework that not only ensures data consistency and clarity but also fosters a secure, compliant, and interoperable data environment.
📝 Related Reads
An example of integrated semantics and governance by Charlotte Ledoux
The Data Product Layer
In the modern data landscape, the Data Product Layer represents the combination of code, data, and metadata, resulting in reusable consumable Data Products that drive business value.
The model within the Data Product Layer encompasses entities, metrics, measures, and dimensions derived from the semantic layer model, borrowing entities and context and extending them based on the use case’s requirements.
It is important to note that the Data Product Layer caters to a specific slice of data, adding richer context for consumption, while the Semantic Layer serves as a broader bridge to the entire physical data layer.
📝 Related Reads
Learn more on how to bring Data Product Prototype to life here.
Powering reliable LLMs with the super combo of Data Product & Semantic Layer.
Importance of Semantic Layer in Data Products
Data products embody the essential characteristics of being trustworthy, understood, interoperable, discoverable, secure and valuable. By leveraging the semantic layer, organisations can effectively address each of these characteristics, thereby enhancing their data products’ overall quality and utility.
Trustworthy
The semantic layer fosters trust in data products by establishing consistency and lineage across diverse data sources. The semantic layer ensures that data products maintain integrity and reliability through standardised metadata and data models, thereby instilling confidence in their accuracy.
Understandable
With the semantic layer, data products, including AI, become more easily understood by humans and computers. By defining clear data models, the semantic layer enables a comprehensive understanding of data entities, attributes, and their relationships, empowering users to derive meaning and return relevant insights.
Rich metadata provides supporting information about the data in the product to help identify where it is the same or different from data provided in other data products. Business glossaries add additional context, such as consumer-friendly synonyms or descriptions, to the data product model, providing a richer context for the user and bringing the context into the discovery of the data.
Interoperable
The semantic layer promotes interoperability among different data products and systems by using taxonomies and reference data. By standardising data classification and organisation and establishing common data structures in the data models, the semantic layer facilitates seamless integration within complex data ecosystems.
Discoverable
By utilising the semantic layer’s capabilities to comprehend relationships between products, organisations can enhance the discoverability of their data products and enable recommendations of similar products or relevant information. Additionally, employing tagging within the Data Marketplace using taxonomies facilitates easier search functionality, further enhancing the discoverability and accessibility of data products.
Secure
Utilising access controls from the semantic layer simplifies data access for end-users by providing a unified view of the data, regardless of the underlying data sources, while ensuring that sensitive information is appropriately protected and compliant with regulatory requirements.
Valuable
Ultimately, the semantic layer enhances the value of data products by enabling them to deliver actionable insights and meaningful outcomes. By ensuring that data is well-understood, trustworthy, and interoperable, the semantic layer empowers organisations to derive maximum value from their data assets, driving informed decision-making and strategic initiatives. A Data Product can’t provide value if it isn’t used and trusted.
In addition to maturing each of the Data Product characteristics, using the semantic layer will also benefit Enhanced Data Governance by centralising business rules and definitions, the semantic layer and Agility in Development by decoupling the front-end tools from the complexities of the underlying data sources.
In summary, by embracing the semantic layer, organisations can elevate the value and impact of their data products by creating a unique product for the use case, which is also consistent across the organisation.
Conclusion
In conclusion, the semantic layer is a foundational element in enabling the development of reliable and user-centric data products. Its role in simplifying data consumption and ensuring consistency makes it essential in the data product ecosystem.
The semantic layer plays a pivotal role in enabling the development of data products that are discoverable, understood, and trusted, without which their value is unlikely to be realised. Positioned between the physical data and the consumption of data, the semantic layer serves as a translator, transforming the complexities of underlying data sources into the language of the business.
Ultimately, by embracing the semantic layer, organisations can elevate the quality and impact of their data products, tailoring unique solutions for specific use cases while maintaining consistency across the organisation. The ability to leverage the semantic layer to deliver business meaning and knowledge will serve as a differentiator in navigating the rapidly evolving landscape of AI solutions.
Thanks for reading Modern Data 101! Follow us for free to receive new posts and support our work.
Originally published in Modern Data 101 Newsletter.
From The MD101 Team
Bonus for Sticking With Us to the End! 🧡
Here’s your own copy of the Actionable Data Product Playbook. With over 500 downloads so far and quality feedback, we are thrilled with the response to this 6-week guide we’ve built with industry experts and practitioners. Stay tuned on moderndata101.com for more actionable resources from us!
✍🏻 Meet the Authors
Connect with me on LinkedIn