Where to build business logic on Snowflake: a PowerBI case study

BI is the top workload on Snowflake for analytics. Yet, there is no “always correct” architecture for connecting BI to Snowflake.

Choosing the right trade-offs in the BI architecture has cost, performance, modeling, and governance implications.

At a high level, there are a few alternatives to consider:

  1. Loading data into the BI tool: Create a copy of a dataset in BI (using PowerBI Import mode, Tableau’s Extract, etc.)
  2. Directly connecting BI and Snowflake: Run queries from BI in a Snowflake Virtual Data Warehouse (using PowerBI Direct Query, Tableau’s Live Connection, etc.)
  3. Standalone Semantic Layer on Snowflake: Recently, a third choice has emerged: standalone semantic layers that act as an intermediary between BI tools and Snowflake.
Import mode vs Direct query vs Semantic layer flow

In this post, we will look at trade-offs between the traditional approaches and the semantic layer, using PowerBI as an example for a user tool and Honeydew Native Application as a Semantic Layer Powered By Snowflake.

Similar conclusions apply to other BI tools (like Tableau or Domo).

What’s a Semantic Layer?

A standalone Semantic Layer or a Metric Store offers a new approach: separate business logic from BI tools and ELT processes into a centralized location that serves all tools.

PowerBI includes a very powerful semantic engine that allows logic to be built in DAX. Business, like a YoY user growth metric can be built by analysts with DAX knowledge in PowerBI.

That definition though exists only in PowerBI — other tools like Tableau would not leverage that definition.

BI-Centric Archictecture flow: Logic in BI

A standalone semantic layer takes a different approach. A metric is implemented in it is available to any consumption tool. It can be used in a PowerBI dashboard by the finance team for reporting; at the same time, it can be part of a Tableau worksheet for an analysis of product features impact on user growth; or used by an analyst writing SQL in Snowflake to see it by cohort.

Semantic Layer Archicture: logic no longer in BI

The Traditional Approach: Logic in the Tool

The traditional bi-centric approach is business logic encoded within the BI tool.

PowerBI + Snowflake is a common combination and can perform very well with that approach (especially using Keith Smith’s Snowflake and PowerBI best practices).

When to use a direct query?

In a direct query mode, data is served directly from Snowflake:

  • Data freshness — always running on current data without the need to reload it.
  • Data size — some data is too large to be loaded into a BI tool.
  • Governance — when using a direct connection, can leverage Snowflake’s extensive data governance abilities to protect and monitor access to data.
  • Avoid data duplication — an import is another data copy to manage.

When to load data instead?

In import mode, data is loaded into PowerBI:

  • Performance — direct queries are not always performant, requiring careful modeling and following the many best practices such as pre-aggregations and query parallelization.
  • Limited modeling — Direct Query in PowerBI sets a number of limitations, for example when dealing with time metrics (”year to date”). Working around those limitations can be complex and error-prone.
  • Full native functionality — whether it is having date hierarchies in a date spine or being able to use every DAX and M function, an import mode is more native to the tool.
  • Multiple sources — when data is loaded from data sources that are not Snowflake.

Note that, unlike other BI tools, PowerBI allows a combination of Direct Query and Import in a composite model to help find a middle ground between performance and functionality when the data is on Snowflake only.

The Other Way: Put The Logic Where The Data Is

A semantic layer offers a different way of building the stack. From a tool perspective, it looks very similar to a direct query connection (PowerBI asks the semantic layer to compute a metric, instead of computing it in memory over data within PowerBI).

That way it shares many of the advantages of a direct query mode:

  • Don’t think about data copies
  • Data freshness and data size are unlimited — everything comes from Snowflake
  • Governance is centralized in Snowflake

When to use a semantic layer over a direct query?

  • Performance on live Snowflake data is important
  • Need to share a logic definition with more than one tool (i.e PowerBI and Tableau)
  • Complex metric logic (composite metrics, non-additive metrics, time metrics)
  • Logic is part of a larger workflow involving ETL (i.e. with dbt)

When to use a direct query?

  • Legacy data transformation flows done in DAX and M: those might be better off implemented as SQL-based ELT before using a semantic layer.
  • No need for an extra moving part: a semantic layer is another component in the data stack. When there is no need for a separation of logic, it can become just another tool to manage.
  • Composite models that combine live connection with data imports.

The Semantic Layer Difference

There are a few fundamental ways in which a semantic layer is different from a direct query:

Centralized Governance for Metrics

Business leaders run by making choices — thousands of them, well, every day, across an entire organization. Being data-driven is about focus and coherency as much as it is about the right decision. That breaks when different people count metrics differently.

A definition for a metric (like counting users) underpins critical business actions. Should you invest in a promising new feature? Terminate an underperforming marketing channel? Expand sales abroad?

A semantic layer enables a single definition for a metric, and a lifecycle for its change management — review, approval, and centralized deployment. It allows the reuse of metrics (”Services Revenue” might be part of building overall “Revenue”) and the lineage to understand them.

Better Performance on Snowflake

BI tools are less optimized for a particular data source, which can lead to suboptimal query patterns (for example, PowerBI tends to send a few different queries one after another to compute a single widget, whereas a single query that performs a JOIN might be more efficient).

The main active performance tool available with live data is pre-aggregations. However, PowerBI pre-aggregations are limited to specific types of logic (sums and counts). They require an administrator to make pre-aggregation ETL logic and DAX logic align and to maintain synchronization of cache and data.

A standalone Semantic Layer native to Snowflake can generate efficient native queries. By controlling both metric and pre-aggregation logic together, it removes both modeling limitations and the need for manual synchronization.

Multiple User Tools

A common stack of a modern data-driven organization includes on average 6 different data consumption tools: for example, it can have Looker, PowerBI, Excel, Hex, DataGrip, and Snowsight, each serving a different team or use case.

When tools share the same data on Snowflake, they end up needing to share logic as well — from join paths to metric definitions. A semantic layer puts the logic where the data is, enabling to have multiple tools.

Collaboration and Ownership

Modern standalone semantic layers enable a development workflow separated from any single tool: git version control, environment separation, and distributing ownership of pieces of logic. This enables organizations to better track and manage their definitions as well as avoid conflicting definitions.

In Conclusion,

Trade-offs are important when building an architecture for BI and Snowflake, whether choosing to load data, query it directly, or combine different options.

For a semantic layer native to Snowflake, find Honeydew in the Snowflake Marketplace.

--

--