Part 01 : IBM Product Analytics — Relational Data Modeling

Manuella MEBU
3 min readJun 3, 2024

--

Data Modeling

For the question below, we will not solve it in this article; instead, we will simply present the data model between the two tables. We will address the question in an upcoming medium story.

Question

IBM is analyzing how their employees are utilizing the Db2 database by tracking the SQL queries executed by their employees. The objective is to generate data to populate a histogram that shows the number of unique queries run by employees during the third quarter of 2023 (July to September). Additionally, it should count the number of employees who did not run any queries during this period.

Display the number of unique queries as histogram categories, along with the count of employees who executed that number of unique queries.

Level of difficulty: Easy

Question from : DataLemur

How do we tackle this question ⁉️

Data profiling :

SELECT * FROM employees

-- There are 250 employees in the 'employees' table.
-- I will not display the query result due to the large number of rows (250).

SELECT * FROM queries
-- There are 1000 rows in the 'queries' table.

How are these tables linked ?

SELECT COUNT (employee_id), COUNT(DISTINCT employee_id), COUNT(*) FROM employees
-- We have 250 rows for the three columns
-- There are no null values in the employee _id column, as COUNT(*) = COUNT(employee_id)
-- COUNT (employee_id)= COUNT(DISTINCT employee_id), this implies that employee_id is unique in the table
-- Therefore, employee_id could be the primary key column

Let us do same for the table 'queries'

SELECT COUNT(query_id), COUNT(DISTINCT query_id) FROM queries
-- COUNT(query_id) = 1000 and COUNT(DISTINCT query_id) = 999.
-- This implies that query_id is not a primary key in the queries table.
-- query_id is not linked to only one employee; it can be linked to multiple employees.

Let us see if an employee can have several querie_id in the queries table

SELECT employee_id, COUNT(query_id), COUNT(DISTINCT query_id) FROM queries
GROUP BY employee_id
ORDER BY COUNT(query_id) DESC

-- Each employee has at least one query in the queries table.
-- No two rows for the same employee have the same query_id.

Let us see if all employees are linked to a query_id

SELECT COUNT (e.employee_id) FROM employees e
LEFT JOIN queries q ON e.employee_id = q.employee_id
WHERE query_id IS NULL

-- The result is 22
-- This implies that there are 22 employees who are not linked to any query_id in the 'queries' table.

Solution : Data Modeling

From the above data profiling the relationship between the two tables is as follows:

employees (0, N) — — — — queries (1, N)

employees (0, N):

This indicates that an employee can have zero or more associated queries. This is shown by the left join where some employees have no corresponding queries.

queries (1, N):

This indicates that each query must be associated with a minimum of one employee. query_id is not linked to only one employee; it can be linked to multiple employees.

I share what I have, let us create a virtuous circle. If you read this article, know it is your turn to share something. I am open to constructive comments

mebu.kevine@yahoo.com

LinkedIn: Manuella Mebu

👇 Leave a comment below and let us know what you think!

--

--