By Gil Benghiat
When analyzing data, people talk about a star schema. This DataKitchen blog post will answer the questions: What is a star schema? Why is it needed?
Background and Context
Often, data is stored in database tables which have relationships with each other. A schema is simply a group of related tables in a database. There are two categories of schemas: operational and reporting. The operational schema stores data as it is being collected and updated. An example of this is the database behind a point of sale application in a pharmacy.
The pharmacy records and updates customer and physician records, the point of sale system creates transaction records as prescriptions are filled. This operational schema is optimized for inserts and updates. However, the data in an operational schema is not in a format that facilitates analysis — it is geared for machines, not people.
For Business Intelligence (BI) and other analytic use cases, a reporting schema is employed that is optimized for reads and aggregations. The database software itself can be optimized for this reads and aggregations (e.g. Amazon Redshift), but that is another topic. One of the most popular ways to organize a reporting schema is with a star schema.
The Star Schema
Star schemas are organized into fact and dimension tables. Looking at the pharmaceutical sales example, facts are measurable data about the event. In this case the sale, e.g. quantity and price. Dimensions are the actors or attributes related to the sale e.g. Product, Patients, Prescriber (doctor), Payer (insurance company), Period (date). The number of records in the dimension table is usually smaller than the number of facts. The two tables are related by record identifiers.
Why Needed? Less Space and Ease of Use
The data could be stored in a single table where the attributes are repeated on each row (like the source for an Excel pivot table). However, that would take up much more space and makes managing the dimensions more difficult. Master Data Management is the discipline of making sure the dimensions are accurate. Also, a star schema maps very well into visualization tools such as Tableau. Finally, the organization of a star schema matches the business language, not the programming language.
Star Schema Gallery
Below are several ways at looking at a star schema.
- An Entity Relationship (ER) diagram in a data modeler tool
- From Tableau’s define Data Source step
- From Tableau’s visualization UI
- The SQL to create the tables in AWS Redshift
Here is what a star schema looks like in a data modeling tool (Oracle SQL Developer Data Modeler, a free download):
When defining a Tableau Data Source, it looks like this:
And what it looks like in Tableau where you can select the measures from the fact table and dimensional attributes:
Finally, for completeness, here is the SQL that defines the tables in Amazon’s Redshift:
Like this story? Download the 140 page DataOps Cookbook!