Data Warehouse : Design Data Model for Ride Sharing or Taxi Service

Sowmya Mupparaju
Geek Culture
Published in
5 min readJun 6, 2021

--

Photo by Andre Benz on Unsplash

The origin of word “taxi “ from the Online Etymology Dictionary is a shortened form of the word “taxicab”, which is derived from the two words: “taximeter” and “cabriolet”.The Cab or Taxi Service have been around from centuries and calling a cab service was pretty expensive in olden days and it has become more affordable than ever. Thanks to technology growing around us , we can now book a taxi and track the ride realtime from your device. In this article we will design a data model that can capture all critical data elements including the trips , ratings , documents and driver performance metrics.

Four-Step Dimensional Design Process: Kimball Group

  1. Select the Business Process
  2. Declare the Grain (“How do you describe a single row in the fact table?”)
  3. Identify Dimensions (“How do business people describe the data resulting from the business process measurement events?”)“who, what, where, when, why, and how” associated with the event.
  4. Identify Facts

Reference from https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/four-4-step-design-process/

Solution

Business Process : Taxi Company would like to design a data model to capture all critical data elements.

Track rides done by driver and their Performance

How many rides are happening to a common/famous destinations each day( Airports , Parks , Museums etc)

How many trips are cancelled per day.

How many rides and the average price during the peak hour per day.

Grain : Individual Trip on each Transaction Level

Dimensions : Date , Customers, Drivers , Cars , Documents , Devices ,Locations.

Facts : Trips , Payments

Derived Facts: These are recommend to be stored physically in the table.In this case study, the total amount is straightforward, but storing it means its computing consistently in the ETL process, to eliminate the possibility of user calculation errors. Certain Derived Facts can be added in the view to minimize Space or can be added in any BI tool.

Eg: Total Cost of Trip Column: Addition of Base Rate ,Surge Rate , Taxes , Tolls , Additional Fees.

Section 1: Dimensions

Date Dimension:

All Data models always need an explicit date dimension table. There are many date components that are not supported by the SQL date function, including week numbers, fiscal years, seasons, holidays, workdays and weekends. Rather than attempting to determine these calculations in a query, its would be easy to store in the date dimension table.

Customer Dimension:

In this area , we will cover the users who are using the ride service app. We will use this information in the center of the model. The users table contains a list of all relevant users ( Current and New Users). For each user we will store their Unique ID , Name ( First and Last Name) , Email , Address , Phone Numbers , joined date , Current Plan ID ( To determine which membership they are currently using) , Is Active ( To determine if user is still active or Opted out of service) and Payment ID.

Note that User Login Details are not mentioned in the users table, we can certainly maintain different tables to track the each login made by the user( Login Start Date , Login End Date) based on his permissions if any.

Also Note that if user have more than one address or phone numbers , we can add communications( With Communication Type column( Phone/Email/Address) and Preferences tables.

Driver Dimension:

In this area , we will cover the drivers who provide the service on the ride service app. The drivers table contains a list of all drivers including past and current drivers. For each driver row in the table , we will store their Unique ID , Name ( First and Last Name) , Email , Address , Phone Number , joined date , Current Car ID ( To determine which vehicle is set to default by the driver) , Driver License No and Is Active ( To determine if driver is still active or Inactivated).

Documents Dimension:

This table contains all the documents uploaded by an entity ( Can be customer or user ). In the previous customer dimension table , we stored the driver’s License Number but other information such as Doc ID , Doc Name , Doc Category , Document Type , Document State , Expiry Date , country, Driver ID etc columns will be stored in documents table. This table is 1:M ( one to many) and one driver can have multiple documents in the table. ( Eg: User can upload another license document once his/her current license gets expired ).

Cars Dimension:

This table stores all the cars that a driver registered on the platform. All cars have unique Car ID , Driver ID, Year , Make , Model , License Plate No , car type ( standard , luxury , compact ) , Base rate and Is Active Flag to indicate the record is active.This table is 1:M ( one to many) and one driver can have more than one car active at the same time.

Devices Dimension:

This table stores all the devices that are compatible to use the platform. All devices have unique ID ,type of device( mobile , Tablet , Desktop) and Is Active Flag to indicate the status of the device.

Locations Dimension:

This table stores all the details of popular landmarks in that particular city or a country. Every row in the table has a unique Location ID , Latitude , Longitude , Landmark Type ( Airport , park , museum) , Landmark Name , Landmark City , State and country.

Section 2: Facts

Trips Fact:

In this area, we will cover all the trips created by the user to access the ride sharing platform. Every trip has a unique trip id, customer id, driver id , device id , car id , Trip Requested Timestamp , Trip Start and End timestamps , Trip wait time , Start and End Location ID , Driver and customer ratings , Payment Id and Status of the trip ( 0 : Cancelled , 1: In Progress 2: Completed)

Payments Fact:

This table stores all the payments made by the customer. Every payments includes a Payment ID , Customer id ( who made the payment) , Date of Payment , Payment Method , Total Amount , Base rate , surge rate , Tip Amount , Taxes ,Transaction Id and Status of Payment.

--

--

Sowmya Mupparaju
Geek Culture

Senior Software Engineer @Apple | A Farmer's Daughter I Columbia MBA