Decoding Earning Disparity: An Exploratory Analysis

A visual journey with Plotly and Seaborn

Rosé with Rho
Analytics Vidhya
14 min readJul 11, 2021

--

I am all for recycling this week!

This is part of a project I did for an introductory course on Programming with Python and deemed worth sharing on a wider platform. The idea then was to take a simple dataset; one that doesn’t require much bandwidth on cleaning, and dive into unearthing underlying truths in the data to weave a story worth telling.

Introduction and Rationale

Motivation: Gender Pay gap
Over the recent years, there have been reports of gross inequalities in the compensation provided to women and men for the same designation and expertise in a job. According to a 2019 report by CNBC, this is especially prevalent in Healthcare, Financial Management, and the Legal Profession, where women are known to be offered a lower remuneration as opposed to their male counterparts. This difference in wages between males and females is commonly known as a Pay Gap.

Photo by Clark Van Der Beken on Unsplash

Data Source: The data used is from an open source dataset available on Kaggle, sourced originally from Glassdoor, a website where employees can post reviews about current and past employers. The platform is typically used by candidates who want to understand the work culture and salary insights of a prospective employer. The dataset chosen contains information on users, ranging from their educational background to their current seniority level and designation.

Through this analysis, the aim is to analyze the pay structures of various candidates across the given attributes to get a comprehensive understanding of the data. Subsequent analysis is done specifically to understand if the trends of employees’ salaries across profiles vary for Males and Females, to identify the presence of a Pay Gap if any, and additionally infer how the Gap varies across different factors.

1. Setting up environment

The granularity in this dataset is at an individual candidate level, and the information captured includes their Job Title, Gender, Age, their most recent Performance Evaluation, Educational Background, Department of work, Seniority level in the Profession in terms of Work Experience in Years and Compensation— broken into Base Pay and Bonus.

A few more calculated variables are incorporated in the next steps to make the analysis richer.

Head of the dataset

This is a pallet that I customized and often use in many notebooks as it comes in handy while using viz libraries.

2. Data Preparation and cleaning

As standard practice, going ahead with some basic hygiene checks on the dataset before proceeding to exploration.

There are no null values in the dataset
Summary of Numerical variables in the dataset
Summary of Categoircal variables in the dataset

2.2 Data Preparation

2.2.1. Convert seniority and performance evaluation to factors

Since Seniority and Performance Evaluation are recorded as ordinal variables (meaning discrete levels, ranging from 1–5 in this case), they need not be stored as float and are converted to categorical (object) variables for this analysis.

2.2.2 Calculate Total Pay

To calculate an employee’s Total Pay per annum, their Base Pay and Bonus Pay are added.

2.2.3. Creating Age buckets

From the summary statistics we know that age ranges between 18 to 65. Analyzing the trends across various age in brackets would give better insight than at each individual age. The age is bucketed into 4 groups of 12 years each.

Head of the dataset — after calculating new variables TotalPay and AgeBuckets
Summary statistics of all Categorical Variables

Data Summary

  • There are slightly higher number of males than females
  • Operations is the most common department
  • Marketing Associate is the most common job title
  • Most candidates have a Seniority of 3 years
  • 5 is the most common performance rating
  • Most candidates’ educational background is only up to High School
  • Most candidates fall into the 18- 30 age group

3. Exploratory Data Analysis

3.1 Univariate distribution plots

To understand the spread of the data in each of the variables, the following distribution plots are generated.

3.1.1 Categorical Variables

Univariate Distribution Plots for Categorical Variables

There are no stark outliers or missing values as per the distribution plot of any of the categorical variables.

3.1.2 Continuous Variables

Univariate Distribution Plots for Numerical Variables

Base pay, Bonus and Total pay are fairly uniformly distributed with a peak in the middle for each of the variables.

3.2 Multivariate Distribution Plots

This section has exploratory analysis to understand the distribution and behaviour of two or more variables in the dataset.

3.2.1 Gender Diversity across different attributes

The frequency distribution of males and females across each of the categorical variables is plotted in this section.

Gender Diversity across attributes

Findings for Gender Diversity

Age Groups

  • 18–30 age group has highest number of males, 43–54 is the only age group with more females than males
  • Women are approximately equally distributed in all age groups, between 113–120 in each group

Department

  • Women are lesser than men in every department
  • Least women are in management, followed by engineering

Job Title

  • Most women are Marketing Associates, while the same job title has the least number of men
  • Manager and Software Engineer job titles have least number of women

Educational Background

  • Most number of women are high school graduates
  • Women are lesser than men at every level of education background except college

Performance rating

  • Most females are rated 1 out of 5 in their performance evaluation
  • As compared to Males, the frequency of females receiving a perfect evaluation score of 5 is significantly less

Seniority

  • Each level of seniority has lesser number of females compared to men, except for candidates with 5 years of experience

3.2.2 Jobs vs Educational backgrounds

Frequency distribution of educational backgrounds across different Job Titles is plotted in this section.

Educational Backgrounds across Job Titles

Key Insights

Marketing associate is the most popular job title within which most candidates have educational background up to High School

IT jobs have a majority of candidates with College level of education

Both Graphic Designer and Software Engineer jobs have a majority of Masters level of education

Data Scientist jobs have the highest number of PhD scholars

3.2.3 Analysis of Salary Components by Department

The components of salary — Base pay and Bonus are different for different job titles within each department.

Key Insights

Highest paying jobs: Managers are paid most in each department in terms of average Base Pay and Total Pay, followed by software engineers

Least paying jobs: Marketing associates are the least paying jobs in terms of Base Pay and Total Pay

Bonus for different job titles differ across each department

3.2.4 Understanding the relationship between Components of Pay

The BasePay and Bonus for Males and Females across different departments is analyzed using a scatter plot with a regression line to understand if the nature of relationship between the variables is linear.

Relationship between Base Pay and Bonus Pay across Genders

Key Insights
The scatter plots do not follow any set trend and the r² values of the regression lines are very small. Even though the relationship appears to be negative, the variability in Bonus cannot be explained completely with Base Pay. Nothing conclusive can be said about the relationship between the bonus and base pay for males or females in any of the departments.

3.2.5 Pay at levels of Seniority

The following heatmap is plotted to understand the trends in pay offered with increase levels of seniority.

Pivot View of Total Pay across Seniority for different Job Titles
Payscale Heatmap across Seniority levels

Key Insights
The colour gradient indicates the magnitude of average total pay in each subgroup. A darker shade indicates a higher magnitude of average pay and vice versa. As expected, for each of the job titles, the average total pay offered per annum increases with increasing level of seniority.

4. Analysis of Disparity in Pay in Females vs Males

4.1 Calculation of Gender Pay gap by department

This section involves the calculation and visualization of the difference in pay across the Genders (if any).

Boxplot showing how average Total Pay for Females is lower than Males in every department
Aggregate view for average Total Pay across Departments — Females vs Males

Tabular data below indicates that the average Base Pay and Total Pay offered to women in each department is lesser than their male counterparts.

Calculation for Pay Gap in each Department

Key Insights
Pay gap is most dominant in the Engineering department where women are paid $111,00 less than men per annum on an average. In earlier analyses it was evident than the number of women in engineering department is the lowest. The considerable difference in pay may be one of the reasons why women feel discouraged to go into engineering.

4.2 Understanding Gender Pay Gap by job titles within each department

The following analysis is done to understand if there are certain job titles within each department that drive the income disparity at a departmental level.

Exploring Pay Gap at a more Granular level in Job Titles within Departments
Interactive Treemap showcasing Pay Gap in Job Titles within each department

The above treemap helps understand the income disparity with respect to job titles within each department. The colour gradient indicates the magnitude of average total pay in each subgroup. A darker shade indicates a higher magnitude of average pay and vice versa.

Key Insights
It is observed that the Pay gap within a given department is not concentrated in one job role. It is a cumulative effect of discrepancy in men and women’s earnings across all jobs in a department that give a net effect of lesser average pay in women.

Sales: Women are paid lesser IT, Marketing associate, sales associate, software engineer and warehouse associate job roles

Engineering: Women are paid lesser in Manager, Marketing Associate, Sales Associate and Software Engineer roles

Management: Women are paid lesser in Financial Analyst, IT, Manager, Marketing Associate, Sales associate and Software Engineer roles

Operations: Women paid lesser in Driver, Financial analyst, Software engineer roles

Administration: Women paid lesser in Driver and Marketing Associate roles. There are no female software engineers in admin.

4.3 Understanding gender Pay gap by Seniority in each department

While earnings do increase with increasing seniority, the earnings for women stand lower than men at each level.

Pivoting data to required view of Seniority levels within Departments
Average Total Pay for varying levels of Seniority within Departments — Females vs Males

Key Insights
It is observed that for each level of seniority within the departments, women are paid lesser than men. This trend varies across departments but is most prominently seen in sales department for individuals with a seniority of 5. It appears that even if women have the same number of years of work experience in a given department, they don’t earn the same as men.

Caveat here on Bubble/Scatter Plots, I have given a size element to the data points in this scatter plot to which is an exponential factor of average total pay. This is done specifically to highlight the differences in pay for males vs females with increasing levels of seniority. As observed, the size of the bubble keeps increasing with every level yet, the size of the purple (male) bubble is always bigger than the pink one (female).

4.4 Performance evaluation and Earning Disparity

Women have a fixed range of performance evaluation that does not cross 4.0 in any department. A deep dive was done to analyze if poor performance rating was the reason behind lower average pay in women.

Spread of Performance Evaluation Scores — Females vs Males
Calculating average Total Pay across Performance Evaluation Scores — Females vs Males
Average Total Pay for varying Performance Scores within Departments — Females vs Males

Key Insights

From the box plot it is evident that the distribution of women’s performance in evaluation is left skewed in management and men’s performance evaluation is right skewed in engineering department. However, to further understand if performance evaluation is the reason behind women’s salaries being lower than men’s, the average salary at each performance rating in a department is plotted.

It is observed that men make more than women in each department, regardless of their performance rating. The only exception to this is for employees rated 2/5 in Management department and those rated 5/5 in sales department where there is no considerable difference in Total Pay for men and women.

It can be concluded that women earn lesser than males despite having the same performance evaluation.

4.5 Understanding earning disparity with Educational Backgrounds

To understand if educational background was the reason why women were paid lesser than males, the average pay for males and females in a given department was compared based on their level of education.

Calculating average Total Pay across Educational Levels— Females vs Males
Interactive Sunburst Chart showcasing Pay Gap in Educational Backgrounds within each department

The above sunburst chart helps understand the disparity in earnings in males and females with relation to their educational backgrounds. The colour gradient indicates the magnitude of average total pay in each subgroup. A darker shade indicates a higher magnitude of average pay and vice versa.

Key Insights
Employees who have studied up to PhD level of education earn the highest. Out of employees who have studied up till PhD, indiviudals working in the engineering division earn the highest.

However, women are found to be earning lesser than their male counterparts on an average in each department despite having the same level of education.

Conclusion

Several factors were analyzed in an attempt to understand the disparity in earnings in individuals from the given dataset. There are multiple perceivable factors that contribute to the Total Pay received by employees. While educational background, seniority, job title and department affect the Pay scale, gender should not play a role in determining an individual’s salary in an ideal scenario. However, through multiple explorations it is observed that this is not the case.

Engineering Department has the highest pay gap where women earn $111,00 lesser than men on average per annum

Within departments, there are perceivable pay gaps in most job titles. However, the gap is reversed in Data Scientists and Graphic designers in Management where women earn higher than men

There is a perceivable disparity in each seniority level wherein women earn lesser than men for the same years of work experience. This is most prominently seen in Sales professionals with an experience of 5 years.

The performance rating for women in management is left skewed as compared to the distribution for men. However on analyzing the average salary for men and women in each department and their performance rating, it was found that men make more than women in each department, regardless of their performance rating

Employees with PhD level of education earn the highest in each department. However, even then, women earn lesser than men despite having the same educational background

Through analysis of the above components, no causal factor could be established, due to which a female’s average pay should be lesser than a male’s. Therefore, it appears that a systemic bias is in play because of which women earn lesser than men despite having similar educational background, department of work, job title, performance rating and seniority.

That’s all folks!

This concludes a deep dive analysis on a simplistic dataset using Pandas for Data wrangling and Plotly & Seaborn for Visualizations. I am duty bound to warn that this is an analysis that holds true for the given dataset, and not an opinion. Reserving my personal take on social issues for another time and place, preferably offline.

Till then, thanks for reading and please feel free to reach out and drop your comments and suggestions below!

--

--

Rosé with Rho
Analytics Vidhya

A beginners take at why Data is the real new normal of Businesses in Industry 4.0.