Star, Snowflake, and Fact Constellation(Galaxy Schema) Schemas

Aditya Goel
3 min readSep 15, 2023

--

Star Schema:The star schema is a widely used schema design in data warehousing. It features a central fact table that holds the primary data or measures, such as sales, revenue, or quantities. The fact table is connected to multiple dimension tables, each representing different attributes or characteristics related to the data in the fact table. The dimension tables are not directly connected to each other, creating a simple and easy-to-understand structure.

Simplicity: Star schema is the simplest and most straightforward schema design, with fewer tables and relationships. It provides ease of understanding, querying, and report generation.
Denormalization: Dimension tables in star schema are often denormalized, meaning they may contain redundant data to optimize query performance.

Example: Consider a retail data warehouse. The fact table might contain sales data with measures like “Total Sales” and “Quantity Sold.” The dimension tables could include “Product” with attributes like “Product ID,” “Product Name,” and “Category,” and “Time” with attributes like “Date,” “Month,” and “Year.” The fact table connects to these dimension tables through foreign keys, allowing analysts to perform queries like “Total Sales by Product Category” or “Quantity Sold by Date.”
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Snowflake Schema: The snowflake schema is an extension of the star schema, designed to further reduce data redundancy by normalizing the dimension tables. In a snowflake schema, dimension tables are broken down into multiple related sub-tables. This normalization creates a more complex structure with additional levels of relationships, reducing storage requirements but potentially increasing query complexity due to the need for additional joins.

Normalization: Snowflake schema normalizes dimension tables, resulting in more tables and more complex relationships compared to the star schema.
Space Efficiency: Due to normalization, the snowflake schema may require less storage space for dimension data but may lead to more complex queries due to additional joins

Example: Continuing with the retail data warehouse example, in a snowflake schema, the “Product” dimension may be normalized into sub-tables like “Product_Category,” “Product_Subcategory,” and “Product_Details,” each holding specific attributes related to the product. This normalization allows for efficient storage of data, but it may require more complex queries to navigate through the snowflake structure.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Fact Constellation (Galaxy Schema):The fact constellation schema, also known as a galaxy schema, is a more complex design that involves multiple fact tables sharing dimension tables. It is used when there are multiple fact tables with different measures and each fact table is related to several common dimension tables.

Complexity: Fact constellation schema is the most complex among the three designs, as it involves multiple interconnected star schemas.
Flexibility: This schema design offers more flexibility in modeling complex and diverse business scenarios, allowing multiple fact tables to coexist and share dimensions.

Example: In a data warehouse for a healthcare organization, there could be multiple fact tables representing different metrics like patient admissions, medical procedures, and medication dispensing. These fact tables would share common dimension tables like “Patient,” “Doctor,” and “Date.” The fact constellation schema allows analysts to analyze different aspects of healthcare operations while efficiently reusing shared dimension tables.

--

--

Aditya Goel

Documenting and sharing my learnings as I grow in Data and Business Analytics field,in the simplest way possible