Database Design I: Employee Attrition Management System

Chideraozigbo
10 min readDec 25, 2023

--

Two carton ladies standing while shaking hands

Before we begin proper on the topic at hand, let’s talk about the importance of database design to data engineers

In the dynamic landscape of data engineering, effective database design stands as a cornerstone, providing the foundation for efficient data management and analysis.

Database design is a critical aspect of data engineering that directly influences the success of data-driven initiatives within organizations. It involves the systematic organization of data to ensure optimal storage, retrieval, and analysis. For data engineers, the importance of a well-designed database lies in its ability to provide a scalable, secure, and high-performance infrastructure for managing vast volumes of data.

Project Overview

Case Study

Building an Employee Attrition Management System database

Objectives

The Employee Attrition Management System project aims to create a comprehensive and effective platform for human resources by capturing and storing detailed employee information, recording attrition events, and analyzing performance metrics. The system will facilitate trend analysis through historical data storage and incorporate predictive analytics to forecast potential attrition.

The HR Attrition data is currently saved on my local computer. I got the dataset to practice with from kaggle. You can download the dataset here. The dataset contains 38 columns and 1,481 rows of data. Here is a snapshot of the data

Snapshot to the HR Attrition dataset I’ll be working with for this project

After rigorous checks and understanding of the dataset and project. This project will be broken down into various phases as follows:

  1. Database Creation & Data Importing
  2. Table Creation
  3. Data Normalization
  4. ER Diagram
  5. Creation, Loading Data and Altering Tables Constraints
  6. Creating Views, Triggers
  7. User Management & Security
  8. Data Masking
  9. Database Backup & Recovery

N.B: This project will be executed in MYSQL DBMS

· Database Creation

I initially created my database, opting to name it HR_Attritions.

The SQL queries I employed are provided below for reference.

-- Drop the database if it exists
DROP DATABASE IF EXISTS Hr_Attritions;

-- Create a database
CREATE DATABASE Hr_Attritions
DEFAULT CHARACTER SET utf8mb4;

-- Use the created database
USE Hr_Attritions;

· Table Creation & Data Importing

At this stage, I had to examine the dataset once again to ensure the dataset is cleaned and formatted correctly. For the table creation in preparation of the data, the following SQL Query was used

-- Drop the table if it exists
DROP TABLE IF EXISTS Employees;

# To create the table Employees
CREATE TABLE Employees(
EMPID VARCHAR(10),
Age TINYINT,
AgeGroup ENUM('18-25', '26-35', '36-45', '46-55', '55+'),
Attrition ENUM('Yes', 'No'),
BusinessTravel ENUM('Non-Travel', 'Travel_Frequently', 'Travel_Rarely'),
DailyRate INT,
Department VARCHAR(30),
DistanceFromHome INT,
Education TINYINT,
EducationField VARCHAR(30),
EmployeeCount SMALLINT,
EmployeeNumber INT,
EnvironmentSatisfaction TINYINT,
Gender ENUM('Male', 'Female', 'Others'),
HourlyRate INT,
JobInvolvement TINYINT,
JobLevel TINYINT,
JobRole VARCHAR(40),
JobSatisfaction TINYINT,
MaritalStatus ENUM('Single', 'Married', 'Divorced'),
MonthlyIncome INT,
SalarySlab ENUM('Upto 5k', '5k-10k', '10k-15k', '15k+'),
MonthlyRate INT,
NumCompaniesWorked TINYINT,
Over18 ENUM('Y', 'N'),
OverTime ENUM('Yes', 'No'),
PercentSalaryHike TINYINT,
PerformanceRating TINYINT,
RelationshipSatisfaction TINYINT,
StandardHours TINYINT,
StockOptionLevel TINYINT,
TotalWorkingYears TINYINT,
TrainingTimesLastYear TINYINT,
WorkLifeBalance TINYINT,
YearsAtCompany TINYINT,
YearsInCurrentRole TINYINT,
YearsSinceLastPromotion TINYINT,
YearsWithCurrManager TINYINT,
DepartmentID TINYINT,
RatingID INT,
SalaryID INT,
DemographicsID VARCHAR(10),
TravelID VARCHAR(6),
JobID INT,
EducationID INT,
WorkHistoryID INT
);

# To check if table was created and check it's schema
DESC Employees;

Here is a brief description of what each columns in the dataset holds

EmpID: Employee ID
Age: Age of the employee
AgeGroup: Age group to which the employee belongs
Attrition: Employee attrition status (whether the employee has left the organization or is still active)
BusinessTravel: Frequency of business travel for the employee
DailyRate: Daily rate of pay for the employee
Department: Department in which the employee works
DistanceFromHome: Distance in miles from the employee’s home to the workplace
Education: Level of education attained by the employee
EducationField: Field of education of the employee
EmployeeCount: Number of employees
EmployeeNumber: Unique identifier for each employee
EnvironmentSatisfaction: Employee’s satisfaction level with the work environment
Gender: Gender of the employee
HourlyRate: Hourly rate of pay for the employee
JobInvolvement: Employee’s level of job involvement
JobLevel: Level of the employee’s job position
JobRole: Role of the employee within the organization
JobSatisfaction: Employee’s satisfaction level with their job
MaritalStatus: Marital status of the employee
MonthlyIncome: Monthly income of the employee
SalarySlab: Categorization of monthly income into salary slabs
MonthlyRate: Monthly rate of pay for the employee
NumCompaniesWorked: Number of companies the employee has worked for in the past
Over18: Whether the employee is over 18 years old
OverTime: Whether the employee works overtime or not
PercentSalaryHike: Percentage increase in salary for the employee
PerformanceRating: Performance rating of the employee
RelationshipSatisfaction: Employee’s satisfaction level with work relationships
StandardHours: Standard working hours for the employee
StockOptionLevel: Level of stock options granted to the employee
TotalWorkingYears: Total number of years the employee has worked
TrainingTimesLastYear: Number of training sessions attended by the employee in the last year
WorkLifeBalance: Employee’s work-life balance satisfaction level
YearsAtCompany: Number of years the employee has worked at the current company
YearsInCurrentRole: Number of years the employee has been in the current role
YearsSinceLastPromotion: Number of years since the employee’s last promotion
YearsWithCurrManager: Number of years the employee has been working with the current manager

The next step I embarked on was to load my data into my table. Although, I encountered lots of challenges in importing my data in MYSQL, click here for a guide on how I did it.

Here is the query used

-- Load data into the table Employees
LOAD DATA LOCAL INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\EDITED_HR_Analytics.csv'
INTO TABLE Employees
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Check if the data was imported
SELECT * FROM Employees;

Now, it’s the moment to apply normalization to my database.

· Data Normalization

I observed that my data was initially normalized to 1NF, prompting me to proceed with further normalization to achieve 2NF and 3NF. This step aimed to eliminate partial dependency and transitive dependency, thereby reducing data redundancy and eliminating anomalies. It’s worth noting that normalization to 3NF is particularly necessary in OLTP (Online Transaction Processing) systems, while OLAP (Online Analytical Processing) systems generally require normalization up to 2NF.

In the course of normalization, I subdivided my table into distinct entities, resulting in the creation of tables such as

  1. Employee Information Table (EmpInfo)
  2. Job Information Table (JobInfo)
  3. Rating Information Table (RatingInfo)
  4. Salary Information Table (SalaryInfo)
  5. Work History Table (WorkHistory)
  6. Demographics Table (Demographics)
  7. Business Travel Table (BusinessTravel)
  8. Department Table (Department) and
  9. Employee Education Table (EmployeeEducation).

As a side note, I introduced new columns to the original CSV sheet to serve as unique identifiers for the aforementioned tables. This transition leads us to the next phase — the creation of the Entity-Relationship (ER) Diagram.

· ER Diagram

For my ER diagram, I used this website to show the relationship between my tables and ER Diagram is part of the logical aspect while designing a database. Here is my snapshot for it

ER DIAGRAM OF HR-ATTRITION

· Creation, Loading Data and Altering Tables Constraints

In this project phase, I will create individual tables and simultaneously modify the constraints, while also loading the respective data into them.

Below are the queries for creating some of the tables.

-- Create EmpInfo Table
CREATE TABLE `EmpInfo` (
`EmpID` VARCHAR(10) NOT NULL,
`Age` TINYINT NOT NULL,
`Gender` ENUM('Male', 'Female', 'Others'),
`MaritalStatus` ENUM('Single', 'Married', 'Divorced'),
`Over18` ENUM('Y', 'N'),
`EmployeeCount` SMALLINT,
`EmployeeNumber` INT,
`SalaryID` INT NOT NULL,
`TravelID` VARCHAR(5) NOT NULL,
`JobID` INT NOT NULL,
`DemographicsID` VARCHAR(10) NOT NULL,
`WorkHistoryID` INT NOT NULL,
`EducationID` INT NOT NULL,
PRIMARY KEY (`EmpID`)
);

-- Confirm if the table was created
SELECT * FROM EmpInfo;

-- Create Department Table
CREATE TABLE `Department` (
`DepartmentID` TINYINT NOT NULL,
`Department` VARCHAR(30),
PRIMARY KEY (`DepartmentID`)
);

-- Confirm if the table was created
SELECT * FROM Department;

-- Create JobInfo Table
CREATE TABLE `JobInfo` (
`JobID` INT NOT NULL,
`DepartmentID` TINYINT NOT NULL,
`JobRole` VARCHAR(40),
`JobLevel` TINYINT,
`JobInvolvement` TINYINT,
`RatingID` INT NOT NULL,
PRIMARY KEY (`JobID`)
);

-- Confirm if the table was created
SELECT * FROM JobInfo;

-- Create RatingInfo Table
CREATE TABLE `RatingInfo` (
`RatingID` INT NOT NULL,
`JobSatisfaction` TINYINT,
`PerformanceRating` TINYINT,
`EnvironmentSatisfaction` TINYINT,
`RelationshipSatisfaction` TINYINT,
`WorkLifeBalance` TINYINT,
PRIMARY KEY (`RatingID`)
);

-- Confirm if the table was created
SELECT * FROM RatingInfo;

Subsequently, I proceeded to populate the various tables with data extracted from the employee table, which was initially loaded from the CSV file.

The following outlines the process I followed to load specific tables.

-- Load data for EmpInfo
INSERT INTO EmpInfo (
EmpID,
Age,
Gender,
MaritalStatus,
Over18,
EmployeeCount,
EmployeeNumber,
SalaryID,
TravelID,
JobID,
DemographicsID,
WorkHistoryID,
EducationID
)
SELECT DISTINCT
EmpID,
Age,
Gender,
MaritalStatus,
Over18,
EmployeeCount,
EmployeeNumber,
SalaryID,
TravelID,
JobID,
DemographicsID,
WorkHistoryID,
EducationID
FROM Employees
ORDER BY EMPID;
-- Check if the data was loaded
SELECT * FROM EmpInfo;
-- Load data for EmployeeEducation
INSERT INTO EmployeeEducation (
EducationID,
Education,
EducationField
)
SELECT DISTINCT
EducationID,
Education,
EducationField
FROM EMPLOYEES
ORDER BY EDUCATIONID;
-- Check if the data was loaded
SELECT * FROM EmployeeEducation;
-- Load data for Department
INSERT INTO Department (
DepartmentID,
Department
)
SELECT DISTINCT
DepartmentID,
Department
FROM Employees
ORDER BY DepartmentID;

Following the creation and data loading of the tables, I opted to modify the tables by incorporating constraints such as Entity Constraints, Referential Constraints, and Null Constraints.

The subsequent details outline the steps I took to accomplish this task.

-- Alter EmpInfo Table
ALTER TABLE `EmpInfo`
ADD CONSTRAINT `fk_EmpInfo_SalaryID`
FOREIGN KEY (`SalaryID`)
REFERENCES `SalaryInfo` (`SalaryID`);

ALTER TABLE `EmpInfo`
ADD CONSTRAINT `fk_EmpInfo_TravelID`
FOREIGN KEY (`TravelID`)
REFERENCES `BusinessTravel` (`TravelID`);

ALTER TABLE `EmpInfo`
ADD CONSTRAINT `fk_EmpInfo_EducationID`
FOREIGN KEY (`EducationID`)
REFERENCES `EmployeeEducation` (`EducationID`);

ALTER TABLE `EmpInfo`
ADD CONSTRAINT `fk_EmpInfo_JobID`
FOREIGN KEY (`JobID`)
REFERENCES `JobInfo` (`JobID`);

To get information on the full process on how i did it. Kindly click here for full documentation and files

· Creating Views

At this point, the database has been configured for the HR Department. Consequently, I took the initiative to establish a view aimed at addressing six specific queries posed by the HR Manager.

Question 1: Can you provide an overview of the demographic information for our employees?

Result 1

Question 2: How is the salary distributed among our employees?

Result 2

Question 3: What is the average number of years since the last promotion for each job role?

Result 3

Question 4: How is work-life balance perceived among employees in different job roles?

Question 5: How does job satisfaction vary by gender?

Result 5

Question 6: What is the attrition rate by department?

Result 6

To get the queries on how I got the result, find them in my Github page.

Please hang around a bit longer. Haha, there’s more documentation, but I’ll condense it to keep things interesting.

·User Management & Security

User management and security are important aspects of database administration, ensuring the confidentiality, integrity, and availability of sensitive information. Through access control mechanisms, user management restricts unauthorized access, preventing potential breaches and safeguarding against data manipulation. Role-based access control assigns permissions based on job roles, enforcing the principle of least privilege.

With that being stated, concerning User Access and Management, I established four roles, namely:
- DBA: Designed for Database Administrators with full privileges.
- HR Manager Role: Tailored for HR Managers, equipped with select, update, insert, and delete privileges.
- HR Analyst Role: Intended for HR Analysts, providing select privileges only.
- HR Intern: Catering to HR Interns with limited access to specific tables and views.

Subsequently, I proceeded to create users and allocate them to their corresponding roles. Below is a snippet of the code.

-- TO CREATE HR_MANAGER ROLE
CREATE ROLE Hr_Manager;

-- TO ASSIGN PRIVILEGES TO HR_MANAGER ROLES
GRANT SELECT, INSERT, UPDATE, DELETE ON hr_attritions.* TO Hr_Manager;

-- To Create HR Analyst Role
CREATE ROLE hr_analyst;

-- To Assign Privileges to HR Analyst Role
GRANT SELECT ON hr_attrition.* TO hr_analyst;

-- To Create HR Intern Role
CREATE ROLE hr_intern;

-- To Assign Privileges to HR Intern Role
GRANT SELECT ON hr_attritions.businesstravel TO hr_intern;
GRANT SELECT ON hr_attritions.demographics TO hr_intern;
GRANT SELECT ON hr_attritions.department TO hr_intern;
GRANT SELECT ON hr_attritions.empinfo TO hr_intern;
GRANT SELECT ON hr_attritions.employeeeducation TO hr_intern;
GRANT SELECT ON hr_attritions.jobinfo TO hr_intern;
GRANT SELECT ON hr_attritions.ratinginfo TO hr_intern;
GRANT SELECT ON hr_attritions.workhistory TO hr_intern;
GRANT SELECT ON hr_attritions.masked_salaryinfo TO hr_intern;

-- To Create DBA Role
CREATE ROLE DBA;


GRANT ALL PRIVILEGES ON *.* TO DBA WITH GR

·Data Masking

In order to uphold the confidentiality and security of sensitive information, I restricted users with the HR Intern role to limited access, excluding the salary info table. Instead, I devised a view of the salary info table, wherein I implemented a data masking technique to conceal the employee monthly salary details for this user category.

·Database Backup

Database backups are essential for safeguarding against data loss, system failures, or unforeseen disasters. Regularly backing up databases ensures the availability of a recent, restorable copy of critical data. In the event of accidental deletion, corruption, hardware failures, or cyberattacks, having a comprehensive backup strategy enables organizations to quickly recover and minimize downtime. It acts as a safety net, allowing businesses to maintain business continuity and protect their valuable information assets.

The frequency of database backups depends on the nature of the data and the business requirements. Critical databases with frequently changing data may require daily or even more frequent backups to ensure minimal data loss. For less dynamic datasets, weekly or monthly backups might suffice.

As we mark the end of this phase, the Employee Attrition Management System database design is a foundational component ensuring efficient data organization and accessibility. By incorporating normalized tables, relationships, and security measures, the database not only captures comprehensive employee information but also facilitates effective attrition tracking. The system’s well-defined structure supports HR decision-making, providing insights into employee demographics, work history, and performance. This project lays the groundwork for a robust management system, promoting data integrity, security, and strategic workforce planning. Its implementation empowers organizations to proactively address attrition challenges, fostering a resilient and informed approach to talent management.

Find the link to the dataset to practice with in my Github.

Thanks for reading.

--

--