Does an academic “Gender Wage Gap” exist at the University of Florida?

Download this paper in PDF form here.

The effect of gender on income has been a controversial topic for decades. Studies conducted at the aggregate level, across various industries, have reached mixed conclusions about the so-called “Gender Wage Gap”. In this paper, we analyze several employee levels within the University of Florida to determine the significance of gender on salaries.

This study analyzed over 5,000 individual salaries for academic and non-academic staff at the University of Florida. Gender and Job Title were used as factors for an ANOVA model. The ANOVA model suggested that Gender alone is an insignificant predictor of salary amount, but the model also showed significant interaction between Gender and Job Title. Additionally, pairwise comparisons suggest that within each academic job level, males are paid more than females. A possible explanation for this phenomenon is discussed.


Literature Review

Data Cleaning

Exploratory Plots

ANOVA Testing


Literature Review

Two recent studies analyzed academic salaries at the aggregate level by taking differences between two medians. In January 2018, a Nature article titled “Gender Pay Gap Persists” cites a report from the US National Science Foundation claiming that 2016 median salaries for male PhD holders was 24% higher than female PhD holders, across all fields. However, the report did not disclose whether the salaries were within or outside academia.

In March 2017, an article in The Chronicle of Higher Education titled Gender Pay Gap Persists Across Faculty Ranks” claimed that the 2015 median salary for full professors was $18,200 higher for males than for females.

To our knowledge, neither of these two studies performed tests for statistical significance. In this paper, we will structure and conduct those tests to more rigorously analyze the gender effect.

Data Cleaning

Florida state payroll data is publicly available at Salary data for the 12 institutions within the State University System of Florida are collected every year around April and November. The data is available as a csv file, where each row is an individual and each column is an attribute. Attributes include university name, budget entity, first name, last name, job title, employee type, full-time equivalent score, and annual salary. However, the data needs to be cleaned before use.

The raw dataset contains 90,919 entries.

Removing Duplicate Entries — The dataset can contain duplicate entries for the same person. For instance, one professor could be paid for “Educational & General” purposes, as well as for “Contracts & Grants”. This professor would then be entered in two times within the dataset, instead of only once. To rectify this problem, we match individual entities by first name, last name, middle initial, university name, and job title. Then, we aggregated salary information for each individual.

Additionally, we disregarded entries for temporary workers being paid an hourly wage. This way, our dataset only contains salaried workers. We also removed entries where any columns contained NA values.

The dataset now contains 48,478 entries (90,919 previously).

Full-Time Equivalent Adjustment If a professor teaches year-round, he will receive a full-time equivalent (FTE) score of 1.00. However, if he decides to not teach during the summer, he will receive a FTE score of 0.75. We adjusted each entry’s annual salary by its FTE score, so that everyone can be compared on the same time basis. This manipulation did not change the number of entries.

Consolidating Job Titles — The database contains 4600 unique job titles. Since we intend to include job title as a factor in our ANOVA model, we must decrease the number of unique job titles.

To accomplish this, we gathered the 200 most frequent job titles, which accounted for 31,384 of the 48,478 entries. We then consolidated those 200 job titles into 5 groups: Admin/Office Staff, Professor, Assistant Professor, Associate Professor, and Lecturer/Research/Postdoc. By having three distinct “Professor” classes, we hope to capture the effect of tenure and experience in academia. The Lecturer/Research/Postdoc class represents an academic function that is not set for a tenure track. We constructed the Admin/Office Staff class to represent a generic non-academic function.

The dataset now contains 18,386 entries (48,478 previously).

Disregarding Non-UF Entries — Ideally, we’d include the University as a factor in our ANOVA model, however we would have drastically unbalanced sample sizes, as seen in Figure 1.

Figure 1 — Number of entries, by University

Note the uneven distribution of faculty counts in the dataset. If we include University as a factor for our ANOVA model, the unbalanced sample sizes would have adverse impacts on significance tests.

We could artificially draw equal-sized samples from each school. But if we did that, we would be over-representing samples from schools like UWF and FGCU. Therefore, to keep our analysis clean, we only focus on UF for the rest of this study.

The dataset now contains 6,232 entries (18,386 previously).

Minimum Wage Check — A handful of entries contain suspiciously small annual salaries. For example, the database shows 9 Assistant Professors at UF with annual salaries less than $1000, even after adjusting by FTE scores. We interpret these values as erroneous entries or atypical cases, that we can safely remove and ignore.

The Florida hourly minimum wage is $8.25. Assuming 40 hour workweeks for 50 weeks a year, this sets the annual minimum wage at around $16,000. Therefore, we remove all entries from the dataset that have annual salaries less than $16,000.

The dataset now contains 6,216 entries (6,232 previously).

Gender Assignment — The dataset does not contain a column for gender, so we must assign genders to each entry. To do this, we use the “gender-guesser” module (version 0.4.0) for Python. The module is based off a program written by Joerg Michael named “gender”. The program references a list of more than 40,000 first names in all European countries, in addition to countries like China, India, Japan, and the United States.

The program will classify a given first name into one of the following categories: “Male”, “Female”, Mostly Male”, “Mostly Female”, “Andy” (androgynous), or “Unknown” (if name is not found in the “gender-guesser” list). Table 1 shows the most frequent names for each category in our dataset.

Table 1

Notice how “Mostly Male” names seem pretty likely to be male, while “Mostly Female” names seem pretty likely to be female. Therefore, we add entries with the “Mostly Male” label to the “Male” category, and we add entries with the “Mostly Female” label to the “Female” category.

However, names with the “Androgynous” or “Unknown” label seem pretty difficult to classify, with any degree of confidence. Therefore, we remove these entries from our dataset. The number of entries, by Gender, is shown in Figure 2. Note how both genders are evenly represented in our sample.

Figure 2

The final dataset contains 5,379 entries (6,216 previously).

Exploratory Plots

Salary Histogram

Since Vilfredo Pareto first analyzed wealth distribution over a century ago, the non-normality and fat-tailedness of income distributions has become common knowledge. Figure 3 shows that the same pattern holds for UF: the distribution is heavily skewed to the right.

Figure 3

This issue will resurface when we check for normality of model residuals.

Distribution of Counts, by Job Title

Notice how the distribution of entries across job titles is pretty evenly distributed.

Figure 4

Salary Boxplots, by Gender

At first glance, it appears that Males get paid significantly more than females.

Figure 5

While most studies would stop and declare the existence of a “Gender Wage Gap”, there exists more analysis to be done.

Salary Boxplots, by Job Title

Annual salary seems to vary significantly between different job titles as well.

Figure 6
Which effect is greater: the effect of rank (job title) or gender?
Let’s conduct an ANOVA test to find out!

ANOVA Testing

Before conducting the actual test, let’s re-address the non-normality of salaries. It turns out that if we try to fit ANOVA models on the un-transformed salaries, our residuals are egregiously non-normal.

Figure 7

Therefore, we perform a Box-Cox transformation on the annual salaries. Likelihood ratio tests for the Box-Cox parameter indicate that the transformation parameter is not equal to zero and that a transformation is necessary.

The transformation parameter for the model form in Equation 1 (below) is estimated to be -0.29.

Model Form

The model form after Box-Cox transformation is:

Equation 1

We treat Gender as a fixed factor since we only care about distinctions between the two genders, and we treat Job Title as a fixed factor since the jobs in our analysis were chosen to encompass all job types relevant to this study. We proceed with the base group approach.

Checking Assumptions

  • Normality of Residuals. QQ plot (Figure 8) and PP plot (Figure 9) suggest that normality has not been significantly violated.
Figure 8
Figure 9
  • Constant Variance, Independence, and Outliers. Results for relevant tests are shown in Table 2. While residual independence seems to be satisfied, inter-group variance seems to vary greatly within our sample. However, since our sample sizes are reasonably balanced (see Figure 2 and Figure 4), this should not be too much of an issue. For outliers, we calculated Cook’s Distance for the residuals, and found that 3.66% of the residuals were outliers (using a threshold of D>(4/n)). Therefore, we are not too concerned about outliers.
Table 2

Sum of Squares

Since we are working with unbalanced samples, we first look at Type III Sums of Squares.

Figure 10

It appears that the interaction term is slightly significant. However, notice that Gender, by itself, is highly insignificant, given the other terms in the model.

Individual Parameter Estimates

Since we have transformed the original salary values via Box-Cox, we cannot back-transform parameter estimates to make meaningful inference on the original salary values. Therefore, we proceed directly to pairwise comparisons.

Pairwise Comparisons — Tukey HSD

Since an interaction effect may be present, we calculate all pairwise comparisons for levels of the interaction effect, via Tukey’s HSD. To tease out any latent effect of gender, we present a few select pairwise comparisons in Table 3.

Table 3

Within the Admin/Office Staff class, males and females seem to be paid equally. However, within every academic job level, males seem to be paid significantly more than females, with the effect being more pronounced for the Professor classes.

Could this be partially due to the mix of males and females within each job class?
Let’s take a look…

Gender Distribution within Jobs

Gender is unevenly distributed within different job classes, as shown in Figure 11.

Figure 11

*This is an important observation*, since it could explain the results of Table 3. A conjecture:

  • If women tend to not pursue tenured, high-paying professorships, for personal reasons or otherwise, then a large percentage of highly-paid professors will happen to be male.
  • Therefore, the more established and experienced professors will also happen to be male, which leads to male professors being paid more than female professors, not just because they are male, but because they happen to be more experienced.