Visualization of HR Employee Attrition & Performance

Bhadra M
15 min readApr 19, 2023

--

Introduction

The IBM HR Analytics Employee Attrition & Performance offers data on the IBM employees as well as a number of tools for analysing the elements that affect employee attrition. The dataset contains data on demographics, education, employment status, pay, performance measures, and more elements that can be used to spot patterns and trends in employee attrition.

There are 1470 observations in the dataset, along with 35 variables, including 1 target variable (Attrition), which will be the major subject of the visualisation and subsequent analysis. The dataset is a useful tool for researching the elements that affect employee retention and figuring out how to lower it. Additionally, the dataset can be used to create prediction models that can help companies prevent employees from leaving the company by identifying those who are most likely to do so.

Organizations who seek to comprehend the elements that contribute to employee attrition should consider this dataset. Organizations struggle with attrition because it can result in significant expenditures for hiring and training new personnel, as well as decreasing productivity and morale among the staff that stay. Organizations can learn more about the factors that lead to employee attrition by examining this dataset and possibly taking action to lower it.

This dataset includes data on the age, education level, job title, performance rating, and other characteristics of employees at an organisation. Also, if a worker has left the company is included (i.e. their attrition status). With this dataset, we can investigate the connection between these variables and employee attrition and learn more about the motivations behind employee turnover.

Overview of the Data Set

Understanding the numerous variables in the dataset and their connections among themselves is necessary for the preliminary graphical analysis of the HR data set. This first examination might aid in spotting patterns and trends as well as any outliers or irregularities in the data.

A company’s employees’ personal and professional information, as well as information about their jobs, are all included in the HR data set. The dataset contains the following variables: age, gender, marital status, degree of education, department of employment, job title, years of experience, monthly income, performance ratings, and attrition status. In order to comprehend the HR data set, each variable must be understood in its own right.

The age variable offers details on the age breakdown of the company’s workforce. To see how the company’s age groups are distributed, use a histogram to depict this information. To see how employees are distributed by gender, a bar chart can be used to visualise the gender variable.

To see the percentage of married and single employees in the organisation, depict the marital status variable using a pie chart. A bar chart can be used to represent the education level variable and show how the distribution of employee educational backgrounds is distributed.

Understanding the employee hierarchy in the organisation requires an understanding of the job function and department characteristics. To see how these variables are distributed among individuals in various departments and job positions, stacked bar charts can be used.

The distribution of work experience among the employees can be shown by using a histogram to depict the work experience variable. A density plot can be used to represent the monthly income variable and reveal how employees’ pay are distributed.

To understand how performance ratings are distributed among employees, the performance ratings variable can be represented using a bar chart. Finally, a bar chart can be used to represent the attrition status variable in order to comprehend the percentage of workers that have left the organisation.

We can learn more about the distribution of employee demographics, job responsibilities, departments, work experience, salaries, performance evaluations, and attrition by examining the various variables in the HR data set.

These insights can be used to pinpoint areas where the company’s HR practises need to be improved, such as boosting staff retention, raising employee satisfaction, and encouraging professional development. Prior to conducting any further analysis, the preliminary visualisation analysis can assist in identifying any data quality problems or missing values in the data collection.

Visualization of Data

The HR analytics dataset offers a wealth of information for comprehending the causes of employee attrition in businesses. A strong tool for finding patterns, trends, and relationships in data is data visualisation. We can employ a range of visualisation methods, including bar plots, stacked bar charts, density plots, and other charts, to extract insights from the HR analytics dataset.

R, a programming language for statistical computing and graphics, will be used to carry out the analysis and visualisations for this project. We must first import the data into R and load the required libraries before we can begin analysing the HR dataset.

For our analysis and visualisations, we’ll be using numerous R packages, such as ggplot2 and dplyr. A well-liked tool for making data visualisations is ggplot2. A package called dplyr offers capabilities for data manipulation, including data filtering, grouping, and summarization.

# Load the required packages
library(ggplot2)
library(dplyr)

The HR dataset will be imported into R using the read.csv() function, which imports information from a CSV file and puts it in a data frame. The HR dataset will be imported into R from the CSV file and assigned to the variable hr_data.

We may begin our basic analysis and exploratory data visualisation once the data has been loaded and imported in order to get insights into the dataset and spot any trends or patterns.

# Load the dataset
ibm_hr <- read.csv("C:/fakepath/WA_Fn-UseC_-HR-Employee-Attrition.csv")

The bar plot, which may be used to depict the frequency of a categorical variable, such as attrition in this dataset, is one of the most fundamental and educational visualisation approaches. We can rapidly determine how many people have left the company by constructing a bar plot of attrition frequency, which is a crucial statistic for comprehending the effects of attrition on the organisation.

The stacked bar chart is another helpful visual aid that may be used to compare the frequency of a categorical variable across many categories, including job roles or departments. We can identify which departments and job roles have the highest attrition rates by constructing a stacked bar chart of attrition by job role and department. This information can then be used by the organisation to target interventions to lower attrition in those areas.

For displaying the attrition-based distribution of continuous data, such as monthly income, density plots are helpful. We may determine whether there is a difference in the income distribution between employees who have left the company and those who have stayed by constructing a density plot of monthly income by attrition. This can assist determine whether pay affects turnover and whether the company should change its compensation practises to keep workers.

An additional form of stacked bar chart that can be helpful is one that displays the percentage of workers who have departed the organisation by gender and job description. If gender-related issues are influencing attrition in specific work roles, this type of graphic can help to pinpoint them. The company can take action to solve these issues and lower attrition by identifying these variables.

Bar Chart

# Create a bar chart of Attrition Frequency
ggplot(ibm_hr, aes(x = Attrition, fill = Attrition)) +
geom_bar() +
labs(title = "Attrition Frequency", x = "Attrition", y = "Count") +
theme(plot.title = element_text(hjust = 0.5))

The attrition rate in the HR dataset is visualised using a bar plot by the above R function. The ggplot() function from the ggplot2 package was used to create the plot. The Attrition variable is utilised as the x-axis variable and the ibm hr dataset is used as the data source. The bars are filled with colour according to the value of the Attrition variable because the fill argument is also set to Attrition.

The geom bar() method was used to create the bars in the illustration. The geom bar() function’s default behaviour is to count the observations for each attrition value and display that amount as the height of the appropriate bar. The labs() method is used to set the title, x-axis, and y-axis labels for the plot.

In this case, “Attrition Frequency” and “Count” are specified as the title, x-axis label, and y-axis label, respectively. Finally, using the theme() method, the plot title alignment is set to the plot’s centre.

With one bar for “No” denoting employees who did not leave the company and another bar for “Yes” denoting employees who did quit the company, the resulting plot illustrates the frequency of attrition in the IBM HR dataset. The graph demonstrates that the proportion of workers who remained at the company was significantly higher than the proportion of workers who departed. HR experts may find this information helpful in understanding the attrition rate in the firm and perhaps in identifying elements that influence employee retention.

Bar Chart

The x-axis of the graph denotes the two levels of attrition, “Yes” and “No.” The y-axis displays the number of workers in each group. On the plot, there are two bars, one for each level of attrition. The height of each bar represents the number of workers in that category.

This plot is crucial in helping to visualise the company’s attrition rate and gives a preliminary idea of the percentage of employees who have departed compared to those who have stayed. We can see that there are a lot more employees who have remained with the organisation than there are who have left in this particular plot.

By allowing comparisons between the frequency of attrition and other variables in the dataset, such job function, department, and age, this graphic also serves as a basis for additional data investigation. When making decisions about retention measures, this data can be used to spot potential patterns or trends in employee attrition.

Stacked Bar Chart

# create new dataframe with counts of attrition by job role and department
dept_role_count <- ibm_hr %>%
group_by(Department, JobRole, Attrition) %>%
summarise(n=n())

# create stacked bar chart
ggplot(dept_role_count, aes(x = Department, y = n, fill = Attrition)) +
geom_col(position = "stack") +
facet_wrap(JobRole ~ ., scales = "free_x") +
scale_fill_manual(values = c("#E7B800", "#FC4E07"), name = "Attrition") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Attrition by Job Role and Department", x = "Department", y = "Count")

This code generates a stacked bar chart that breaks down the attrition rate by department and job role. The first step is to group the original data frame ibm hr by Department, JobRole, and Attrition, construct a new data frame named dept role count using the summarise() function from the dplyr package, and count the number of occurrences (n).

The second stage involves using the ggplot() function from the ggplot2 library to create the stacked bar chart and specifies the new dept role count data frame as the data source. Two stacked bars result from setting the x variable to Department, the y variable to n, and the fill variable to Attrition (one for employees who have left and one for employees who have stayed).

The plot is divided into many panels, one for each JobRole, and set up in a grid with free horizontal scaling using the facet wrap() function. The scale fill manual() function designates Attrition as the legend’s name and sets the fill colours for the bars (yellow for those who stayed and red for those who went). In contrast to labs(), which sets the plot title and axis labels, axis.text.x adjusts the x-axis labels by 45 degrees and realign them to the right. The visual style of the plot can be modified using the theme() function.

The graph shows the attrition rates broken down into two groups: those who have left the organisation and those who have stayed, according to job role and department. Each row of the chart’s many stacked bar charts symbolises a different employment role, and each column denotes a different department. For a particular job function and department, the height of each section of the stacked bar chart represents the number of employees in that particular category of attrition (left or stayed).

Stacked Bar Chart

The chart is color-coded, with orange designating departing personnel and yellow designating those who have remained. The chart’s colour scheme makes it simple to quickly determine the percentage of workers who have departed the organisation by department and job position.

Understanding the attrition rate of various job roles and corporate departments requires comprehending this graphic. It assists in determining the divisions and job functions with the highest and lowest attrition rates, as well as the sorts of workers quitting the organisation. The business can take action to keep its valuable personnel and lower the attrition rate by recognising these trends.

Density Plot

# Density Plot of Monthly Income by Attrition

ggplot(ibm_hr, aes(x = MonthlyIncome, fill = Attrition)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("#E7B800", "#FC4E07"), name = "Attrition") +
labs(title = "Density Plot of Monthly Income by Attrition", x = "Monthly Income") +
theme(legend.position = "top")

A density plot of Monthly Income by Attrition is produced using the code above. Employees who have left the company (Attrition = “Yes”) and those who are still working (Attrition = “No”) are represented by the distribution of their monthly income in the plot. The density of Monthly Income is plotted using the R ggplot function and the “geom density” function.

The graph demonstrates how the distribution of monthly income differs between former employees and current employees. In particular, the plot demonstrates that former employees often have lower monthly incomes than those who are still employed.

This plot is significant because it sheds light on how Monthly Income and Attrition are related. It implies that workers who earn less money may be more likely to quit their jobs, which may have an impact on turnover control and employee retention. Additionally, it emphasises the significance of fair pay procedures and their possible effects on employee retention.

Density Plot

The Monthly Income variable’s distribution for each level of Attrition may be seen using the density map. The estimated probability density function for each group, shown by the density curve, indicates the relative possibility of observing certain Monthly Income values for each group.

The graph demonstrates how the distribution of monthly income differs between current employees and those who have left the organisation. The orange curve, which represents departing employees, has a smaller peak and is more dispersed, showing that the range of monthly income is wider for departing employees. Indicating that the range of Monthly Income is narrower for workers who are still working, the blue curve (representing employees who are still employed) has a higher peak and is more concave.

The relationship between Monthly Income and Attrition can be better understood thanks to this depiction. It implies that workers with a greater range of monthly income are more likely to quit their jobs, which may be related to elements like job satisfaction, work-life balance, and prospects for professional advancement. Managers and human resources specialists may find this information helpful in identifying potential retention tactics and enhancing employee satisfaction and retention.

Stacked Bar Chart II

# Stacked Bar Chart II
ggplot(data = ibm_hr, aes(x = JobRole, fill = Attrition)) +
geom_bar(position = "stack") +
facet_wrap(~ Gender, nrow = 1) +
scale_fill_manual(values = c("#009E73", "#D55E00")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Proportion of Employees who have Left the Company by Job Role and Gender",
x = "Job Role", y = "Count")

With the help of this R code, you can create a stacked bar graph that displays the percentage of workers who have left the organisation by gender and job position. The JobRole and Attrition variables are utilised, respectively, for the x-axis and fill colour, in the ibm hr data set, which contains the data.

This graph is significant because it shows how attrition rates differ within the organisation based on gender and job roles. It enables management to pinpoint regions with high attrition rates and implement remedial measures to lower turnover. Also, it aids in identifying prospective areas where it could be necessary to make revisions to policies, promotions, or salaries in order to maintain staff retention.

Stacked Bar Chart II

The stacked bar graph displays the percentage of workers who have left the company by gender and job role. The jobs held by the company are represented on the x-axis, while the number of departing employees is shown on the y-axis. The bars are stacked to display the percentage of departing employees (orange) and the percentage of remaining employees (green).

The chart is further broken into two distinct sides for each gender. Male employee proportions are represented by the blue facet, while female employee proportions are represented by the red facet.

The graph is crucial for illustrating the company’s attrition rate by job role and gender. It enables us to pinpoint the job functions with the highest attrition rates and determine whether there are any gender differences in attrition. The largest turnover rates are observed among sales reps and laboratory technicians, and there are marginally more female employees quitting the organisation than male ones. The business can utilise this data to create retention strategies and address any differences in attrition based on gender and job role.

Findings

The graphical analysis of the HR data set has produced some intriguing results regarding the company’s attrition rate. Around 84% of the data set’s employees did not leave the organisation, whereas just about 16% did, according to the first bar plot of attrition frequency. This result implies that the company’s attrition rate is reasonably low.

Some work positions and departments have a higher attrition rate than others, according to the stacked bar chart of attrition by department and job type. For instance, among all employment roles, sales reps had the greatest attrition rate, and among all divisions, the research and development department had the highest attrition rate. This finding suggests that certain job roles and departments may need to be examined more closely to identify the factors contributing to the higher attrition rate.

Employees who left the company often had a lower monthly pay compared to employees who stayed, according to the density plot of monthly income by attrition. This data raises the possibility that there is a relationship between pay and turnover, with underpaid personnel more likely to leave the organisation.

In most work functions, the attrition rate was higher for male employees than for female employees, according to the stacked bar chart II of the proportion of employees who have left the organisation by job role and gender. This result implies that the business may need to concentrate on enhancing male employee retention techniques, particularly in positions where the turnover rate is high.

It was feasible to develop a more thorough understanding of the elements influencing the attrition rate and discover potential remedies by looking at the data from several perspectives.

Conclusion

The examination of the HR data set revealed significant information about the rates and patterns of employee attrition inside the business. According to the analysis, 16% of the workforce left the organisation, which raises concerns for management. Employees who were younger, had lower incomes and job levels, and worked in specific divisions like sales, human resources, and research & development experienced a greater attrition rate.

Also, the visualisation analysis revealed that there was a larger percentage of attrition among employees who were single, regularly travelled, and had lower levels of job and workplace happiness. Our results emphasise the significance of developing a supportive and encouraging workplace that provides chances for professional development, work-life balance, and equitable compensation. The analysis also showed that the organisation has to concentrate on keeping its highly qualified and experienced people, particularly those in crucial positions like managers and research scientists. High employment levels have a greater attrition rate, according to the data, which could have a big influence on the productivity and profitability of the business.

The data also showed that attrition rates varied by gender and work role, with women and those in particular job roles being more likely to leave the organisation. These results show that the business must address issues of diversity and inclusion, guarantee equal opportunities for all workers, and foster a culture that recognises and respects diversity.

In conclusion, the study of the HR data set provides important insight on the elements that affect employee attrition and the necessity for businesses to prioritise employee retention plans. According to the findings, businesses should make an investment in developing a culture that values employee happiness, professional growth, and work-life balance. Companies may enhance productivity, profitability, and overall organisational performance by addressing these aspects, which will also help them keep their highly trained workers.

Bhadra M,
CB.BU.P2MBA21034,
6th Trimester, MBA,
Amrita School of Business,
Amrita Vishwa Vidyapeetham, Coimbatore.

“This blog is part of the assignments submitted for the course, Data Visualization and Communication / one of the Business Analytics Electives Courses at Amrita”

--

--