HR Analytics

7 min readOct 7, 2023

Source: https://hrprofessionalsmagazine.com/wp-content/uploads/2022/06/Page34-graphic.jpg

HR analytics, also known as Human Resources analytics or HR data analytics, is a field that involves data analysis and data visualization techniques to make data-driven decisions related to human resources management. In the context of the code and tasks you provided, HR analytics involves:

1. Data Cleaning: This step ensures that the HR dataset is free from errors and inconsistencies, making it suitable for analysis. Specific tasks include deleting redundant columns, renaming columns for clarity, removing duplicate records, cleaning individual columns (e.g., eliminating whitespace), and handling missing values (removing Nan values).

2. Data Visualization: HR analytics often involves visualizing key metrics to gain insights and make informed decisions. The provided code includes plotting various visualizations related to HR data. These visualizations cover aspects such as employee overtime, marital status, job roles, gender distribution, education fields, department distribution, business travel, and relationships between variables like overtime and age, total working years, education level, number of companies worked, and distance from home.

HR professionals and analysts can extract meaningful insights from HR datasets by performing these data cleaning and visualization tasks. These insights can improve HR processes, make informed decisions about workforce management, and enhance overall organizational performance.

Data Attribution:
We appreciate Meri Skill for providing the dataset used in this project. Their valuable contribution made this analysis possible. For further insights into Meri Skill and its data sources, please visit (https://www.meriskill.com).

File descriptions

HR-Employee-Attrition.csv — the training set — Contains 1470 lines with 35 columns.

Data Sample:

Project Flow:

Importing Libraries and investigating the data
Data Validation
Data Cleaning
Data Visualization
Conclusion

Importing Libraries and investigating the data:

After importing the HR- Employee-Attrition.csv data, we can import libraries:

Data Validation:

After importing the libraries, we check information about the data set by calling the .info () method:

As evident, the dataset lacks any missing values, affirming the completeness of all columns. The dataset encompasses numerical and categorical data, characterized by data types such as `int64` and `object64`. Specifically, the categorical data is denoted by the `object` data type.

Data Cleaning:

What is Data Cleaning?

Data cleaning, also known as data cleansing or scrubbing, is identifying and correcting or handling errors, inconsistencies, and inaccuracies in a dataset. It is a critical step in data preprocessing and is essential for ensuring that the data used for analysis or modeling is accurate, reliable, and suitable for the intended purpose. Here’s how data cleaning is helpful in the above project:

1. Ensuring Data Quality: Data cleaning helps identify and rectify missing values, duplicates, and outliers. This ensures that the dataset is of high quality and free from errors, making it suitable for meaningful analysis.

2. Improved Model Performance: Clean data leads to more accurate and reliable models. Inaccuracies in the data can lead to incorrect insights and predictions. Cleaning the data improves the chances of building models that perform well and provide valuable insights.

3. Enhanced Data Visualization: Clean data is more accessible to visualize. When creating plots and charts, having clean data ensures that the visualizations accurately represent the underlying information, allowing for better decision-making.

4. Facilitating Exploratory Data Analysis (EDA): Data cleaning is integral to EDA. It helps data analysts confidently explore the dataset and identify patterns, correlations, and trends, as they can trust the data’s integrity.

5. Reducing Bias and Errors: Biased or erroneous data can lead to biased results and decisions. Data cleaning helps mitigate these issues, making the analysis and decision-making process more objective and reliable.

6. Compliance and Reporting: In some domains, like HR analytics, compliance with regulations is crucial. Clean data ensures that reports and analyses comply with relevant laws and standards.

How is it useful for this Project?

The above project cleaned data to prepare the HR dataset for analysis. It involved removing redundant columns, renaming them for clarity, handling missing values, and addressing duplicates. These cleaning steps ensured that subsequent analyses and visualizations were based on accurate and reliable data, leading to more informed HR decisions and insights.

Deleting redundant columns:

redundant_columns = ['EmployeeCount', 'EmployeeNumber', 'Over18', 'StandardHours']
df_cleaned = df1.drop(columns=redundant_columns)

2. Renaming columns:

# Example: Renaming 'Age' column to 'EmployeeAge'
df_cleaned.rename(columns= {'Age': 'EmployeeAge'}, inplace=True)

3. Dropping duplicates:

df_cleaned = df_cleaned.drop_duplicates()

4. Cleaning individual columns (e.g., removing whitespace from string columns):

df_cleaned['JobRole'] = df_cleaned['JobRole'].str.strip()

5. Removing Nan values:

df_cleaned.dropna(inplace=True)

Data Visualization:

You can use libraries like Matplotlib and Seaborn to create various plots for data visualization. Here’s an example of plotting a correlation heatmap:

df1=df1.corr()
plt.figure(figsize= (20,20))
sns.heatmap(df1.corr(), annot=True, cmap='magma')
plt.title('Correlation Heatmap')
plt.show()

You can create similar code blocks for visualizing other categorical variables and relationships between variables. Remember to customize these examples to fit your specific dataset and requirements.

Fig: Correlation Heatmap for the featured columns

Correlation analysis is used in this project to understand the relationships between different variables in the HR dataset. It helps identify connections, select relevant features, visualize patterns, and make data-driven decisions for HR management.

Distribution of Featured Columns:

The provided code segment generates a grid of subplots to visualize various aspects of the HR dataset. It calculates the layout of the subplots, creates the subplots, and then iterates through the dataset columns to create appropriate visualizations for each column. Numerical columns are visualized using histograms, while categorical columns are visualized using count plots. The resulting grid provides a comprehensive overview of the dataset’s characteristics, making it easier to understand and identify patterns or anomalies.

Career Growth:

The code snippet I have provided creates a line plot to analyze and visualize the relationship between an employee’s “Years in Current Role” and their “Job Level” within the HR dataset. This visualization, titled “Career Growth,” helps understand how an employee’s tenure in their current role relates to their job level progression.

A line plot is suitable for this analysis because it allows us to observe trends or patterns over time or continuous variables. In this case, it helps us assess whether there is a correlation between the years an employee spends in their current role and their job level.

By examining the line plot, one can draw insights into the career development dynamics within the organization. For example, suppose the line slopes upwards from left to right. In that case, it indicates a positive relationship, suggesting that employees achieve higher job levels with more years in their current roles. Conversely, a flat or downward-sloping line might show limited career growth opportunities within the organization.

In summary, this visualization provides valuable insights into the career progression patterns within the company. It aids in HR analytics by identifying potential areas for improvement in career development and talent management strategies.

Conclusion:

Data Loading and Preprocessing:

The project begins with loading HR employee data from a CSV file into a Pandas Data Frame.
An initial exploration of the Data Frame is conducted to understand its structure and contents.

Data Overview:

The dataset contains information on various HR-related attributes of employees in a company, including age, attrition status, business travel, daily rate, department, education, and more.
There are 1,470 entries (employees) and 35 columns (attributes).

Data Cleaning:

A copy of the original Data Frame, df1, is created to preserve the original data.
Columns like ‘EmployeeCount,’ ‘EmployeeNumber,’ ‘Over18’, and ‘StandardHours’ are identified as potentially irrelevant and are dropped from the Data Frame.

Data Transformation:

A new column, ‘age_group,’ is created to categorize employees into age groups based on their ‘Age’ attribute.

Data Exploration:

Correlation analysis is performed to understand the relationships between numerical attributes in the dataset.
A heatmap is generated to visualize the correlations.

Data Visualization:

Subplots are created to visualize the distribution of each attribute in the dataset.
Histograms and count plots are used for numerical and categorical attributes, respectively.

Job Satisfaction Analysis:

Job satisfaction ratings are analyzed based on employees’ job roles.
The data is aggregated to show job satisfaction ratings by job role.

Career Growth Analysis:

A line plot is used to visualize the relationship between the ‘YearsInCurrentRole’ and ‘JobLevel’ attributes, providing insights into career growth within the organization.

The project aims to explore and analyze HR-related data to gain insights into employee demographics, job satisfaction, and career growth within the company. Data cleaning, transformation, and visualization techniques are applied to facilitate this analysis. The findings from this project could inform HR policies and strategies within the organization.