Data modelling techniques: Star Schema

Nimantha Fernando
3 min readJun 25, 2020

--

Multidimensional Schema is especially designed to model data warehouse systems. The schemas are designed to address the unique needs of very large databases designed for the analytical purpose (online transaction processing).

Star schema is mature modeling approach widely adopted by relational data warehouses. It requires modelers to classify their model table as either dimension or fact. Dimension tables describe the business entities(the things you model). Entities can include products,people, places and concepts including time itself. The most consistent table you’ll find in a star schema is a date dimension table. A dimension table contains a key column(or columns) that act as a unique identifier and descriptive columns.

Fact tables store observations or events and can be sales orders, stock balances, exchange rates, temperature, etc. A fact table contains dimension key columns that relate to dimension tables and numeric measure columns. The dimension key columns determine the dimensionality of a fact table while the dimension key values determine the granularity of a fact table. Let’s consider a fact table designed to store sale targets that has two dimension key columns Date and ProductKey. It’s easy to understand that the table has two dimensions. The granularity can’t be determined without considering the dimension key values. As in the example, consider that the values stored in the Date column are the first day of each month. In this case, the granularity is at month-product level. Generally dimension table contain a relatively small number of rows. Fact tables can contain a very large number of rows and continue to grow over time.

star schema figure 1

There are several characteristics of star schema:

· Every dimension in a star schema is represented with the only one-dimension table.

· The dimension table should contain the set of attributes.

· The dimension table is joined to the fact table using a foreign key

· The dimension table are not joined to each other.

· Fact table would contain key and measure

· The star schema is easy to understand and provides optimal disk usage.

· The dimension tables are not normalized.

· The schema is widely supported by BI tools.

There are several advantages and disadvantages in this star schema modeling.

Advantages

A simplified schema means that we won’t have to write confusingly long queries every time we want some information from our database.

We optimized for reads. Now that we can write fewer JOINS, our results will be returned more quickly.

Also, it will business logic for reporting. We won’t have to explain to stakeholders all the crazy joins that went into making the schema, just maybe .

Disadvantages

Denormalizing our data means that data anomalies could arise from one-off inserts or updates. In practice, star schemas are implemented via “trickle feeds” or batch processing to compensate for this issue.

We have limited analytical flexibility. A star schema is generally designed for a particular purpose. Since we have fewer features in the star schema than in the full database, we are restricted to just what this star schema contains.

References

“Star and SnowFlake Schema in Data Warehouse,” Guru99. [Online]. Available: https://www.guru99.com/star-snowflake-data-warehousing.html. [Accessed: 27-May-2020].

Peter-Myers, “Understand star schema and the importance for Power BI — Power BI,” Power BI | Microsoft Docs. [Online]. Available: https://docs.microsoft.com/en-us/power-bi/guidance/star-schema. [Accessed: 27-May-2020].

--

--

Nimantha Fernando

computer engineering undergraduate, business development member at AIESEC in JLC , ENSHIN karate yellow belt holder