Dimensional Modeling with Kimball

Deeksha Kukreti
Your Data Universe
Published in
5 min readJul 26, 2023
Image by Author

The article reflects Ralph Kimball practical design decisions for data warehouse. The three most fundamental manoeuvres in every data-warehouse is drill down, drill across and handling time.

Drill-Down Approach

Dimensional modelling is a top-down design process. First it’s important to identify the main business process that acts as a source to the fact tables, and then populate the fact table with numeric, additive field. The precise definition of drill-down means to have more details. Dimension tables have descriptive values, that can be used as grouped columns. For example, Product dimension table with Brand attribute can group all the brand names. It becomes very useful when designing reports for end users.

In the drill-down approach you can mix both the hierarchical and non-hierarchical attributes from all the available dimensions. For example, single product hierarchy, all the marketing defined attributes and all the financially defined attributes are included in the detailed master product table.

Image by author

From above, it becomes clear that the atomic data must be in the same schema format in an aggregated form of data. Not following this is a showstopper to the architecture in which atomic data is stored in a normalised form. This is most commonly used when multiple business process are arranged in a value chain and each business process gets its own separate fact table. For example the time and product dimension thread through all the fact tables, but not the customer dimension.

Drilling Across Approach

Drilling down is about requesting more granular data from same fact table, then drilling across is the process to link two or more fact tables at the same granularity. It is used when the business process is arranged in a value chain. Each business process has it’s own fact table. For sample, in the manufacturing value chain the product and time dimension are thread to all fact tables, but not the customer dimension.

Image by author

A drill-across report can be created by using grouping columns that apply to all the fact tables used in the report. Attributes from the product and time dimension table can be chosen freely, but customer dimension attributes should be grouped. When common fields are used between the fact and dimension table, then it is called as dimension confirmed across the fact tables. With confirmed dimensions, the only thing important is matching row headers.

The drill-across choice is that you can either send a single, simultaneous SQL request to the two fact tables or send two separate requests. If you bring two separate queries together in drill-across approach, both the queries must have same number of headers plus the combined number of computed facts returned from both the queries.

Handling Time

Some important facts for time in data warehouse is data validity i.e. when the data become valid and when did it cease to be valid. Also, when the detailed description of a entity has changed over time, then it must correctly associate it’s version with the corresponding versions of it’s measurements and entities. For example, a customer made a purchase an year ago, then the description of the customer during that time frame must be correct for that corresponding purchase.

Dimensional modelling organized around measurements is a convenient framework for data warehouse to handle time. Below is the practical implementation of the daily time stamp which consists of a surrogate integer foreign key in the fact table joined to a corresponding primary key in the daily time dimension table in Fig 1.1.

Image by author

Below is how both the tables store data:

DailyTimeDim (Image by Author)
Sales Fact (Image by Author)

Entries such as product and customer slowly change over time and are also called as Slow Changing Dimensions (SCD).

The three fundamental ways to handle slow changing dimensions are:

Type 1 — Overwrite the change attribute, thereby destroying the previous history. This is only used when correcting an error.

Image by author

Type 2 — Issue a new record, keeping the natural key but creating a new surrogate key.

Image by author

The most sophisticated treatment for the type 2 SCD is:

  1. Begin Effected_Date/Time not as surrogate key.
  2. End Effected_Date/Time.
  3. Effected_Date surrogate key connecting to date dimension as snowflake.
  4. Change Description field text.
  5. Most recent flag.

Type 3 — This is handled by in a way that limited historical information is saved in the database tables. In this approach, keep only specific number of previous states, typically the immediate past and the current state.

Image by author

Wrapping Up

Thank you for reading the article and I hope you find the above details helpful. It is influenced by Ralph Kimball book (The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence). Fell free to comment and continue the discussion below.

Please don’t forget to applaud if you liked it.

--

--

Deeksha Kukreti
Your Data Universe

Technology Enthusiast | Data Architect | Scientist | 2 X AWS Certified | Microsoft | Data Wizard