Data Modeling: Understanding the Star and Snowflake Schemas

Contrasting Database Schemas: Fact and Dimensional Tables

Nnamdi Samuel
Art of Data Engineering
3 min readAug 23, 2023

--

Unlike the OLTP (Online Transaction Process) database system, the OLAP (Online Analytical Process) database system has the primary aim of analysis. Before the analysis, certain modifications must be made to the OLTP database system to optimize query performance in order to support analysis. For this reason, database schemas have to be built. Depending on the complexity of the database and the problem to be solved, either the star schema or snowflake schema can be developed to solve the problem.

Before we go further, let us understand two concepts— fact table and dimensional table. The fact table is the central table in a star schema that stores information for analysis. It is often surrounded by a number of tables. These tables are known as dimensional tables.

Star Schema

The star schema is made up of a fact table and dimensional tables. The dimensional tables are linked to the fact table. The dimensional tables are built based on problems to be solved and they represent different aspects or perspectives of the data (e.g., time, product, location). Consider the star schema below:

The Sales table is the fact table and the Products, Location, and Time tables are the dimensional tables. This schema has been made because sales made at various locations, times, and by selling different products need to be ascertained. With this, queries can be carried out and relevant findings can be made.

Snowflake Schema

Consider a case where we don’t just need to know the products sold but also the categories the products belong to and subsequently the subcategories, we would need to make further extensions of the Product table. This means that tables for category names and subcategory names would be created. This would aid in the classification of the products. In this case, the database would be modified further into this:

Simply put, the snowflake schema is an extension of the star schema. In this case, the dimension tables are further restructured or normalized into sub-dimensions in order to achieve desired goals.

Key differences between Star Schema and Snowflake Schema

  1. The star schema has dimensional tables directly connected to the fact table while in the snowflake schema, the dimensional tables have further extensions, and not all the tables are directly connected to the fact table.
  2. The star schema is simpler to understand than the snowflake schema
  3. Star schema is generally denormalized. This means that all the attributes are kept in a single table. On the other hand, the snowflake schema is a normalized structure.

Conclusion

Star schema is most appropriate for querying large datasets because of its simplicity and optimal query efficiency. Though it might complicate query design, the snowflake pattern has advantages for data normalization. The decision between the two is influenced by several factors, including the need for query performance and also the complexity of the data.

Thank you for reading! If you found this interesting, don’t forget to clap and follow me. Also, subscribe to my articles and catch me on LinkedIn.

--

--

Nnamdi Samuel
Art of Data Engineering

Data Engineer💥Voracious Reader and a Writer || Chemical Engineer