The future of BI is Headless

Published in

GoodData Developers

5 min readMar 19, 2021

I work for GoodData and have been focused on multi-tenant analytics for more than a decade. By multi-tenant analytics, I mean a use-case when analytics is shared with external companies: customers, channel partners, suppliers, or other business partners.

Our analytics widgets and self-service analytics tools are usually embedded in applications or web portals. However, there are always tenants who have already adopted a specific BI tool or platform and want to integrate it with our analytics. Initially, we were serving these customers by exporting data from our platform to import them into their BI. Besides the obvious latency that this (often manual) process adds, we’ve run into many consistency issues. Our customers had to recreate many measures, reports, and dashboards in their tool. The numbers that their users were getting from our embedded analytics and the 3rd party tool were often inconsistent. Our customers also complained about duplicated efforts to define the same objects on two different platforms.

I observe the same problem in corporate BI, where different organizations use different BI tools with centralized data managed by their IT.

Both situations inspired us to design the “headless BI” architecture described in this article.

High-level architecture of a BI tool

I’ll start with describing a high-level architecture of today’s BI tools because I believe that there is an inherent problem with tight-coupling that leads to huge inflexibilities and inconsistencies of larger BI solutions.

The architecture of most BI tools (e.g., PowerBI, Tableau, Qlik, and Sisense) has evolved from an original desktop application tool. From a high-level perspective, the architecture of all these tools and platforms looks somewhat like this:

These BI tools often offer a “realtime BI optimized” architecture that removes the analytical storage layer to reduce ETL latencies. PowerBI Direct Query, Tableau direct connections, or Sisense LiveConnect are examples of the “realtime” architecture optimization.

Realtime-optimized high-level BI tool’s architecture

Unfortunately, in most cases, the analytical capabilities are severely limited in the “real-time-optimized” mode. You can, for example, review the PowerBI’s DAX limitations in the direct query mode.

The problem: tightly coupled semantic model

The semantic model component adds a lot of value in terms of analytical capabilities, ease of use, consistency, etc. The semantic model provides shared definitions of

Data model (datasets, tables, columns, facts, dimensions, relationships, etc.)
Measures that calculate the business KPIs are sliced and diced in reports, serve as input for machine learning inputs and parametrize automated business processes.
Governance rules including users, groups, roles, data, and application permissions, etc.

The problem is that most BI tools tightly couple the semantic model with the data visualization tools and components (dashboards, reports, etc.). This leads to the situation described above. Many different definitions of measures and data structures and duplicated efforts with the maintenance of multiple semantic models (e.g., change management, users, access permissions and roles, etc.).

Multiple incompatible and diverging semantic models

At the business level, this problem causes inconsistency of the KPIs and reports that business users consume.

Headless BI: The semantic model as a shared service

The main idea behind the headless BI (aka Data as a Service/DaaS) is to remove the tight coupling between the BI components and expose the semantic model as a shared service via APIs and standard interfaces (e.g., JDBC, DB-API, DataFrame, Dataset, etc.)

The analytical data model measures definition, users, groups, and access permissions are now defined once and shared across all applications and business processes.

Larger BI solutions should introduce the semantic model as a first-class service in their architectures to improve consistency and agility.

Is the semantic model really necessary? Isn’t SQL enough?

You may think that sharing access to your analytical database is enough. I wrote a separate article that compares the semantic model with SQL.

Headless BI requirements

Based on our experience, the following requirements are important for the successful implementation and operation of the headless BI :

Strong semantic model capabilities like powerful and user-friendly query language, real-time and data streaming capabilities support multiple database backends (e.g., cloud and on-premise data warehouses, data lakes, etc.).
Openness towards 3rd party BI, AI, machine learning tools, and application frameworks. The headless BI requires broad support of the standards-based data protocols, APIs, and SDKs. On top of that, specific configurations for BI tools, platforms, and notebooks should be regularly improved, maintained, and tested.
Self-service extensibility. The semantic model must be extensible by end-users (not only by IT or other technical owners). This applies to introducing new data sources, connecting them to existing models, defining new measures (ideally without requiring SQL skills), etc. The extensibility should be available via easy-to-use UI for the end-users self-service and declarative APIs for automation purposes.
Deployment flexibility. The headless BI must support many different deployment options like fully hosted service, cloud-native deployments to many different environments (e.g., public clouds like Amazon AWS, Microsoft Azure, Google GCP, local datacenter, etc.).
Enterprise capabilities: governance, security, performance, scalability, high-availability, etc., are important as the headless BI becomes a mission-critical component for enterprise applications and tools.

GoodData platform ticks most of the marks above.

GoodData Platform

We at GoodData designed our platform in 2008 for by then-emerging cloud architecture. The platform architecture composes of many stateless microservices orchestrated by an asynchronous message hub.

The semantic model service is one of the first ones that we’ve introduced. The service provides report computation, data export, dynamic measure, report or dashboard definition, and many other capabilities. The service is accessible via REST APIs. Over time we’ve built many SDKs over this REST API (e.g., JDBC, Spark Dataset, Javascript, Ruby, and many others).

Currently, we are taking the semantic model service to another level. We are improving the readability of the declarative definitions (for versioning, merging, etc.), cleaning up the structure of the declarative definitions, adding the ability to manage multiple semantic model definitions en-mass, etc.

We’ve just checked another requirement from the list above to support cloud-native deployment that allows our users to deploy analytics to their public and private clouds, or their on-premise datacenters. We aim at the widespread adoption of the new headless BI concept and making it available for free without any bandwidth limitations (e.g., number of users, data volume, etc.).