ELT Project London Bus Safety Performance — 2

3 min readAug 25, 2022

In this part, we will divide data into different dimensions and store in different tables for future use. It is the normal practice in OLAP for analytical use.

More about OLAP and dimensional models: https://medium.com/analytics-vidhya/data-warehousing-schemas-961391b3f3de

Here, I will divide it into 4 dimensions(Date, incident place, bus group, incident type). I will create 4 tables (dim_date, dim_place, dim_company_cat, dim_incident_type) and store relative data.

dim_date

There is one column of ‘Date of Accident’ in the original dataset and the format is yyyy-mm-dd. based on this data, we could analyze datetime from different aspects. Here, I will divide it into year, month, day, week, half_year, quarter.

As dividing date into dimensions is a normal practice, we could store it in a procedure for future use.

2.dim_place

3.dim_company_cat

Based on bus company, we can divide them into different categories for future analyzation. We could consider the company size, national or foreign company, listed or not…

Here, I couted the number of operation lines to decide the company size (1 big, 2 middle, 3 small)

bonus: (another way to create and insert data using subqueries)

4. dim_incident_type

After selecting distinct incident event type, we could find there are we could classify below events into different categories. Therefore, we could create a dimension for it.

In this part, we create four dimensional tables ( dim_date, dim_incident_place, dim_company_cat, dim_incident_type) for analytical use.

ELT Project London Bus Safety Performance — 2

Written by Y_Lu