Using SQL to Analyze Employee Data

Kingsley Ofori
5 min readFeb 18, 2024

--

In today’s data-driven world, businesses rely heavily on data analysis to make informed decisions. One area where data analysis plays a crucial role is in managing employee information. In this short post, we’ll explore how to use SQL (Structured Query Language) to analyze employee data and answer important questions related to employee information, attendance, leaves, performance evaluations, and more.

InsightCrew — Case Study 1

Imagine you’re working in the HR department of a company facing challenges in managing its growing workforce efficiently. With over 100 employees spread across various departments, keeping track of employee information, attendance, leaves, and performance evaluations has become increasingly complex. As a business analyst or an analyst in your firm/company, the onus lies on you to help streamline the process. To streamline these processes and improve overall operational efficiency, you decide to implement an Employee Management System (EMS) and use SQL to analyze the data.

Get the SQL Script — Github

You can pick any imaginary company that you fancy but the fictitious company we will be helping today is called, ‘InsightCrew’.

We started by creating a database called “InsightCrew” and defining tables to store employee information, attendance records, leaves, and performance evaluations.

Each table was designed to capture specific aspects of employee data, such as employee ID, name, department, salary, contact information, attendance status, leave details, and performance ratings.

Step-by-step Tutorial
We will first create a database in SQL, we use the `CREATE DATABASE` statement followed by the name of the database. For example, to create a database called “InsightCrew,” we would use the following SQL code:

CREATE DATABASE InsightCrew;

Once the database is created, we can use it by switching to it using the `USE` statement. This tells SQL to execute subsequent statements in the context of the specified database. For example, to use the “InsightCrew” database, we would use the following SQL code:

USE InsightCrew;

To manage employee data effectively, we will create several tables within our InsightCrew database.

1. Employees Table — This table stores information about each employee, including their unique employee ID, name, department, salary, and contact information.

CREATE TABLE Employees (
employee_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
department VARCHAR(255) NOT NULL,
salary DECIMAL(5, 2),
contact_info VARCHAR(255)
);

2. Attendance Table — This table tracks employee attendance, recording the date and whether the employee was present or absent.

CREATE TABLE Attendance (
attendance_id INT PRIMARY KEY,
employee_id INT,
date DATE,
status VARCHAR(255),
FOREIGN KEY (employee_id) REFERENCES employees (employee_id)
);

3. Leaves Table — This table records details about employee leaves, including the leave ID, employee ID, leave type, start date, and end date.

CREATE TABLE Leaves (
leave_id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
employee_id INT,
leave_type VARCHAR(255),
startdate DATE,
enddate DATE,
FOREIGN KEY (employee_id) REFERENCES employees (employee_id)
);

4. Performance_Evaluations Table— This table stores performance evaluations for employees, including the evaluation ID, employee ID, evaluation date, and performance rating.

CREATE TABLE Performance_Evaluations (
evaluation_id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
employee_id INT,
evaluationdate DATE,
rating INT,
FOREIGN KEY (employee_id) REFERENCES employees (employee_id)
);

After creating these tables, we can view their contents using simple SQL queries:

SELECT * FROM employees;
SELECT * FROM attendance;
SELECT * FROM leaves;
SELECT * FROM performance_evaluations;

In the example code, we’ve also demonstrated how to alter the table structure. For example, changing the data type of the `salary` column in the `Employees` table from `DECIMAL` to `INTEGER`:

ALTER TABLE employees MODIFY COLUMN salary INT;

Now that we’ve created the tables in our InsightCrew database, let’s insert some sample records to populate them. We’ll insert data into the `Employees`, `Attendance`, `Leaves`, and `Performance_Evaluations` tables to simulate real employee information.

First, we will start by inserting employee information into the `Employees` table. Each record will include the employee’s ID, name, department, salary, and contact information.

INSERT INTO Employees (employee_id, name, department, salary, contact_info)
VALUES
(1, ‘Jackson King’, ‘IT’, 8500.2, ‘jacksonking@king.com’),
(2, ‘Udia Usman’, ‘Finance’, 9000.2, ‘udiausman@gmail.com’),
(3, ‘Gladys Riches’, ‘Sales’, 2000.3, ‘gladys@gmail.com’),

(30, ‘Rice Pertita’, ‘Sales’, 3400.7, ‘rice@gmail.com’);

Next, we’ll populate the `Attendance` table with records of employee attendance. Each record will include the attendance ID, employee ID, date, and status (present or absent).

INSERT INTO Attendance (attendance_id, employee_id, date, status)
VALUES
(1, 17, ‘2024–02–01’, ‘Present’),
(2, 18, ‘2024–02–01’, ‘Present’),

(30, 1, ‘2024–02–10’, ‘Absent’);

For the `Leaves` table, we’ll insert records of employee leaves, including the leave ID, employee ID, leave type, start date, and end date.

INSERT INTO Leaves (leave_id, employee_id, leave_type, startdate, enddate)
VALUES
(1, 3, ‘Sick’, ‘2024–02–01’, ‘2024–02–03’),
(2, 2, ‘Annual’, ‘2024–02–05’, ‘2024–02–08’),

(30, 30, ‘Annual’, ‘2024–06–25’, ‘2024–06–27’);

Finally, we’ll populate the `Performance_Evaluations` table with records of employee performance evaluations, including the evaluation ID, employee ID, evaluation date, and performance rating.

INSERT INTO Performance_Evaluations (evaluation_id, employee_id, evaluationdate, rating)
VALUES
(1, 1, ‘2024–01–01’, 4),
(2, 2, ‘2024–01–02’, 3),

(30, 30, ‘2024–01–30’, 5);

Data in its raw nature actually means nothing unless we can make meanings or deductions from it. In this section, we’ll run a series of SQL queries on our InsightCrew database to derive meaningful insights from the data.

Total Number of Employees

SELECT COUNT(employee_id) AS total_employees
FROM employees;

Number of Employees in Each Department

SELECT department, COUNT(employee_id) AS number_of_employees
FROM employees
GROUP BY department;

Employees with Highest Salary

SELECT *
FROM employees
ORDER BY salary DESC
LIMIT 10;

Employees with Salary above 5000

SELECT COUNT(employee_id) AS number_of_employees
FROM employees
WHERE salary > 5000;

Employees with the Lowest Salary

SELECT *
FROM employees
ORDER BY salary ASC
LIMIT 10;

Total Number of Days of Leave Taken by Each Employee

SELECT employee_id, COUNT(leave_id) AS days_of_leave
FROM leaves
GROUP BY employee_id;

Employee Who has Taken the Most Leave

SELECT employee_id, COUNT(leave_id) AS total_leave_taken
FROM leaves
GROUP BY employee_id
ORDER BY total_leave_taken DESC
LIMIT 1;

Number of Employees Who Have Not Taken Any Leave

SELECT COUNT(employees.employee_id) AS number_of_employees
FROM employees
LEFT JOIN leaves ON employees.employee_id = leaves.employee_id
WHERE leaves.leave_id IS NULL;

Average Number of Leaves Taken per Employee

SELECT AVG(num_leaves) AS avg_leaves_per_employee
FROM (
SELECT employee_id, COUNT(leave_id) AS num_leaves
FROM leaves
GROUP BY employee_id
) AS leaves_per_employee;

Employee with the Longest Tenure

SELECT employee_id, DATEDIFF(MAX(enddate), MIN(startdate)) AS tenure
FROM leaves
GROUP BY employee_id
ORDER BY tenure DESC
LIMIT 1;

Total Number of Performance Evaluations Conducted

SELECT COUNT(evaluation_id) AS total_evaluations
FROM performance_evaluations;

Average Performance Rating of Employees

SELECT AVG(rating) AS average_rating
FROM performance_evaluations;

Top-Performing Employees Based on Performance Ratings

SELECT employee_id, AVG(rating) AS avg_rating
FROM performance_evaluations
GROUP BY employee_id
ORDER BY avg_rating DESC;

Number of Employees with Missing Contact Information

SELECT COUNT(employee_id) AS num_employees_missing_contact_info
FROM employees
WHERE contact_info IS NULL OR contact_info = ‘’;

Employees with Missing Contact Information

SELECT *
FROM employees
WHERE contact_info IS NULL OR contact_info = ‘’;

Number of Employees Who Have Changed Departments

SELECT COUNT(DISTINCT employee_id) AS num_employees_changed_departments
FROM (
SELECT employee_id
FROM employees
GROUP BY employee_id
HAVING COUNT(DISTINCT department) > 1
) AS employees_changed_departments;

Employees Who Have Changed Departments

SELECT employee_id, COUNT(DISTINCT department) AS num_departments
FROM employees
GROUP BY employee_id
HAVING COUNT(DISTINCT department) > 1;

Employees with the Highest Number of Contact Information Updates

SELECT employee_id, COUNT(*) AS num_updates
FROM (
SELECT DISTINCT employee_id, contact_info
FROM employees
WHERE contact_info IS NOT NULL AND contact_info != ‘’
) AS contact_updates
GROUP BY employee_id
ORDER BY num_updates DESC
LIMIT 5;

--

--

Kingsley Ofori

1% better everday. - SQL | Python | Tableau | Power BI | Financial Modelling