This is how the UK is tackling the gender pay gap with data

A data-based look at companies' pay gap reports

Lean Guardia
6 min readJun 11, 2020
“Constructed Gaps” — Me (Freiburg, 2018)

Different types of gender inequality persist across time and cultures in different intensities. In the workforce, the gender pay gap is an important and complicated problem which should be solved.

In the UK, equal pay is mandatory, discriminating men or women by paying them differently for the same work is illegal [1]. It is important to differentiate this legal issue with the concept of gender pay gap (GPG). GPG is a measurement that shows the difference between the average pay of men and women, relative to men's earnings [2]. For instance, the GPG among full-time and part-time employees in 2019 was 17.3%. This signifies that, for every pound a man earns (1£), a woman receives less than 83 pence (0.83£).

The Gender pay gap service helps companies understand their GPG and supports them with action plan development [3]. This govenment's service website allows browsing GPG data by filtering individual companies and the reporting years. All this data is available because, since 2017, all companies with 250+ employees must submit a GPG report yearly [2].

In this work a data analysis approach is taken to provide answers to the following questions:

  1. How balanced are pay quartiles by gender?
  2. Which economic activities have the largest pay gap?
  3. What employer characteristics indicate the pay gap?

The Data

Apart from the online exploration service, the dataset can be directly downloaded for local analysis [3]. Among the available GPG indicators, the crucial metrics used in this analysis are "Portion of males and females in each pay quartile" for Q1, "Median gender pay gap in hourly pay" for Q2/Q3 and the "Sector Industrial Classification Codes" for Q2.

The analysed data consists in reports of 10,828 UK companies submitted in the 2018-19 period. This year is the most recent full dataset available as the reporting for 2019–20 is not complete at the time of this writting because the submission deadline was delayed due to the COVID-19 pandemic.

Q1 — How balanced are pay quartiles by gender?

The GPG quartile figures are calculated in four bands. The calculation consists in the following steps. First, all employees' wages are collected and sorted from highest to lowest. This list is divided into four sub-lists (quartiles) of same length. And finally, the representation percentage of both genders for each quartile is calculated [4]. For example, a quartile of 200 employees with 120 men and 80 women would have 60% male and 40% female proportions respectively. The plots below show the density distributions of all companies' quartiles for male and female workers.

Figure 1: Violin plots for employee proportion by quartiles.

In a perfectly balanced world with the same number of male and female workers evenly distributed in all quartiles, the curves would draw smooth rainbow arcs with peaks close to 50%. Despite including all companies of the dataset in the plot, the lower quartile shows harmonious balance.

However, as pay increments, the imbalance increases too. It is clear to visualize that the proportion curve in the top quartile is skewed towards men. Because this curve is symmetric; is can be stated that the majority of employers have almost 90% of man in the top quartiles, or that only 10% of women earn top salaries.

Q2 — Which economic activities have the largest pay gap?

Almost all records in the dataset come with a list of SIC codes. The Standard Industrial Classification (SIC) framework organizes employers in 21 sections which are subdivided into 730+ sectors [5] for economic activity classification. For simplicity, this analysis only considers sections, their distribution across reporting companies is shown in the bar plot below.

Figure 2: Bar plot for number of companies by industrial section.

In order to find the industrial sections with largest pay gap, the company records were sorted descendingly by "Median hourly pay gap", and a sample of thousand (out of 10.8K) companies was taken from the start. The figure below shows the distribution of the sampled company's activities.

Figure 3: Self elaborated

The section with the largest pay gap is Education with more than 32% of the sample. This is not surprising as reduction of the gender pay gap is one of the disputes in the agenda of the UCU strikes held in more than 70 universities in November 2019 and February 2020 [5].

Each of the following four groups with large pay gap conform 10 to 12% of the sample. Construction, Science and Finances are areas where negative role stereotypes exist i.e. societies construct explicit and implicit biases that make us think that some jobs are for a gender in specific [6]. Following the list, Services and Manufacturing might come next because these groups occupy the two biggest groups in the general distribution, see figure 2.

Q3 — What employer characteristics indicate the pay gap?

Qualitative data has been used to study what factors influence in the prediction of the median hourly pay gap. Namely, the proportion of pay quartiles and employer sizes. The former was analysed in Q1 and the latter is one out of 6 possible midponits that represent ranges like "250 to 499", "500 to 999" … "20,000 or more". The quantitative data utilized was the industrial sections a company is classified in.

All these feature were used to train a machine learning regression model, the relevance of the data for modelling the pay gap is in the following table.

Unsurprisingly, lower and top quartiles are the most significant indicators for predicting a company's pay gap, ( 35% and 31.4% relavance). Interestingly enough, with significant less importance, the previous company characteristics are followed by the indication of whether a company belongs to the education section or not (6.2%). Both mid quartiles conform around 10% of importance next. And finally, it is impressive to observe that employer size has less than 1% importance.

Conclusion

After analysing the GPG reports of UK companies with more than 250 employees in the 2018–19 period, the following insights were found:

  • In general, a big proportion of the highiest salaries in the workforce is assigned to men.
  • Several employers with the largest pay gaps are involved in education and economic activities such as construction, technical jobs, science and finances.
  • Lower and top pay quartiles of a company are key indicators to predict its pay gap. In contrast, the size of the company does not seem to influence it.

Admittedly, GPG is much more complicated than pay quartiles and industry sectors. It is also strongly influenced by other factors like paternity leave, seniority levels, working time (full, part, hourly) and society constructs [3]. It is necessary to keep working on it further, so the gap continues reducing.

Making this data public is beneficial for companies to understand their relative stand point on this issue, and is also positive for the data science community to contribute independently and impartially.

The previous data analysis and the figure plotting are fully reproducible by cloning this repository from Github. If you think this work can be improved or extended, please leave a comment below. And if you feel the content was interesting, then clap along.

References

  1. https://www.gov.uk/guidance/equality-act-2010-guidance
  2. https://www.gov.uk/guidance/gender-pay-gap-reporting-overview
  3. https://gender-pay-gap.service.gov.uk/
  4. https://www.gov.uk/guidance/gender-pay-gap-reporting-make-your-calculations
  5. https://www.ucu.org.uk/why-we-are-taking-action
  6. https://www.forbes.com/sites/londonschoolofeconomics/2019/07/05/why-gender-bias-still-occurs-and-what-we-can-do-about-it/

--

--

Lean Guardia

Personal growth, tech for good, software engineering, data, breaking stereorypes and social constructs.