Semantic Layer: Future of Self-Serve Analytics

Seckin Dinc
8 min readAug 2, 2023

--

Photo by Andrés Dallimonti on Unsplash

Today, the word “semantic” has become an integral part of various academic and technical domains, enriching our understanding of communication, cognition, and the intricacies of human language. From its humble beginnings in ancient Greece to its widespread usage in contemporary times, “semantic” continues to be a significant term that reminds us of the timeless pursuit to comprehend the nature of meaning in both human communication and artificial intelligence.

What is Semantic?

The history of the word “semantic” traces its roots back to ancient Greece. The term finds its origin in the Greek word “semantikos” which denotes something “significant” or “meaningful”. The Greeks, renowned for their philosophical and linguistic inquiries, were keenly interested in understanding the intricacies of language and the essence of meaning.

The Greek word “semantikos” emerged from the verb “semaino” which conveyed the idea of “to signify” or “to indicate.” In this context, the Greeks explored the relationships between words and the concepts they represented, delving into the profound nature of communication and expression.

Fast-forward to the late 19th century, and we find the term “semantic” making its debut in the English language. Initially employed primarily in the fields of linguistics and philosophy, “semantic” was used to refer to the study of meaning in language. Scholars and linguists endeavoured to decipher the mechanisms underlying the symbolic representations of ideas through words, phrases, and sentences.

As language and the study of meaning evolved over time, so did the use of the term “semantic.” With the advent of modern computational technology and artificial intelligence, “semantic” expanded beyond its linguistic origins. It found application in computer science and related disciplines, signifying the study of meaning and representation of information, especially in the context of digital data and natural language processing.

Metrics in the Multiverse of Madness

Photo by Darwin Vegher on Unsplash

Organisations are lacking a central repository of aligned knowledge. From B2B to B2C, e-commerce to retail, every single company is still investing more time into basic and ad-hoc reporting rather than they should invest into AI!

Employees in the organisations need to make decisions every day based on the data that they consume from different mediums such as company dashboards, screenshots of charts through Slack messages, customer satisfaction scores at mobile application stores. But how they can trust in the data they consume? Do they even know the definitions and calculations of the underlying information? Are those definitions are written somewhere and aligned between teams? Can anyone with any tool access to those information?

Imagine that you are part of an analytics team. You are supporting sales team and your data analyst buddy is supporting the product team. By chance both of you are tasked to build customer retention dashboards in the next weeks. For sales, customer retention starts with a successful product buying and checks only revenue generation activities in a cycle of 4 weeks but for product it starts with the sign-up and any activity in the cycle of 2 weeks. Who is right and who is wrong? Even if go further, what happens if you build your metrics at data warehouse and your buddy only uses BI tools? How are you and your relative teams are going to meet and align and create a single repository of truth?

Let’s see which initiatives are available in the market to solve this problem space!

What is a Metrics Layer?

The Metric Layer refers to a set of predefined metrics and key performance indicators (KPIs) that are essential for tracking and measuring specific business goals or objectives. It acts as a layer of abstraction that simplifies complex calculations and provides users with standardized, easily accessible performance metrics.

The Metric Layer is often used in BI tools and dashboards to display critical business insights at a glance. Users can interact with these predefined metrics to monitor business performance without needing to build complex queries or calculations.

Minerva, Airbnb Metrics Layer

The problem I described is a common problem for all the companies exist in the world. Some companies like Airbnb has been paying a great amount of attention to the problem space. In 2021, they introduced Minerva as their Metrics Layer.

dbt-labs Metrics Migrating to Metricflow

Recently, dbt-labs made a decision to replace Metrics with MetricFlow. MetricFlow is a semantic layer that makes it easy to organize metric definitions. It takes those definitions and generates legible and reusable SQL. This makes it easy to get consistent metrics output broken down by attributes (dimensions) of interest.

So dbt switches from metrics to semantic layer. What is a semantic layer and why do they need to do such a change?

What is a Semantic Layer?

A semantic layer, also known as a business semantic layer or data semantic layer, is an abstraction layer that sits between end-users and the underlying data sources or databases in a Business Intelligence (BI), data analytics, data science or AI system. Its primary purpose is to simplify data access and analysis by providing a business-friendly view of the data, hiding the complexities of the underlying data structure and technical details.

In essence, the semantic layer acts as a bridge between the technical data storage and the business users, enabling a more intuitive and user-friendly experience when querying and analyzing data. It achieves this by mapping the business terms and concepts that users are familiar with to the corresponding data elements in the data sources.

4 Types of Semantic Layer Approaches

Semantic layer in Business Intelligence (BI) or analytic tools: This type of semantic layer defines business concepts and the relationships between them in the BI or analytic tools. Mostly driven by the data analysts and reporting experts in the organizations.

  • Advantages:
    -
    Ideal for small organizations with small data teams (less than 1% of head count compared to rest of the company)
    - Enables quick report and dashboard development cycles
    - Mostly embraced in B2C companies as they don’t need to communicate metrics and calculations to the clients
  • Disadvantages:
    - Dependency to the data analysts and reporting experts as only they can access and read the semantics specific to the BI or analytic tools
    - If the organization is using multiple tools, the semantics are not shared between them and siloed decision making is nurtured
    - Product, Sales and Marketing experts don’t have direct access to the semantics and conflicts happen most of the time

Semantic layer in a data warehouse: This type of semantic layer defines business concepts and the relationships between them in the DWH and Data Marts. Mostly driven by the data engineers and analytics engineers in the organizations.

  • Advantages:
    -
    Central location for the source of truth in the organization. Any BI or analytics tool can access to the DWH and have the same semantics
    - Logical and physical model optimisations are covered by engineers by the best practises suitable for the production environments
    - Limitations of BI tools are not longer present
  • Disadvantages:
    -
    Accessing directly to the DWH by business users is almost impossible
    - Requires strong data engineers and analytics engineers to build, improve and maintain the structure
    - Suitable for mid-size organizations or organizations have 5–10% data team size compared to company size

Semantic layer within data pipelines: This type of semantic layer defines business concepts and the relationships between them in data pipelines. Mostly driven by the data engineers in the organizations.

  • Advantages:
    -
    Optimised for the efficiency and scalability
    - Metrics and dimensions are not too many and not changing too often
    - Semantics are known and agreed by every business unit upon
  • Disadvantages:
    -
    No direct access for the business and less technical data practitioners
    - In a growing organisation or dynamic business, impossible to maintain as dependent to a single profession, data engineers
    - Hard to visualise the dependencies and lineage between data assets

Universal semantic layer: Focuses on company-wise usage. There shouldn’t be dependency to a single data profession and should be accessible by technical and non-technical users on demand through different tools and services.

dbt-Semantic Layer

The dbt Semantic Layer allows data teams to centrally define essential business metrics like revenue, customer, and churn in the modeling layer (your dbt project) for consistent self-service within downstream data tools like BI and metadata management solutions. The dbt Semantic Layer provides the flexibility to define metrics on top of your existing models and then query those metrics and models in your analysis tools of choice.

Image courtesy https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-semantic-layer

Cube

Cube is the semantic layer for building data applications. It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application.

Image courtesy https://cube.dev/

Conclusion

We are in an era that everyone wants to access to the correct information as soon as possible through different tools such as BI tools, DWH, Excel, Sales tools, etc. Maybe 5 years ago it was OK! to postpone the sales team with a sales dashboard request. But today when the whole world is under your fingertip (thanks to ChatGPT), no one will welcome the delay.

What we have learned from ChatGPT is the self-serve capabilities and quick access to the correct information without a middle person is the key to win the market. If we want to nurture a self-serve analytics culture in our organizations, we need to make investment into our semantic layer.

Investment into semantic layer is not only important for the business people but also for the data teams as most of the business requests are ad-hoc and most of the ad-hoc requests are quite easy to solve with self-serve analytics. Imagining a world beyond the ad-hoc requests can open doors to new horizons such as personalisation, experimentation, and AI revolution!

--

--

Seckin Dinc

Building successful data teams to develop great data products