Race, Gender, and Florida Blue Key — what can data science tell us about achievement at UF?

Florida Blue Key self-identifies as “the most prestigious honorary in the state of Florida” and is constantly lauded as such by the UF community. Naturally, I was curious. What type of student is admitted into Blue Key?

What is the average GPA? 
How many get good jobs after college? 
And most intriguing — are minority students appropriately represented?

There was an alarming lack of public information.

In short, I found that Florida Blue Key (FBK) members were overwhelmingly and disproportionately male, white, and IFC Greek, and there were serious and systemic race and gender gaps in membership. For the years of 2013–2016, Greek students were over four times more likely to get into FBK than the non-Greek student. Additionally, every demographic group, Asian, Hispanic, and Black people, along with women, were significantly underrepresented in FBK. For context, there were more Blue Key members from a single fraternity (AEPi) than all Asian students, Black students, and Hispanic students. A portion of my work was eventually published here, which was critiqued by a select few who said that the data collection methods were short sighted and invalid.

I agree with the critiques. My previous approach, detailed here, lacked the ability to look at the past and determine the source and scope of these biases. To make up for these discrepancies, I created individual level estimates for the race and gender of the 1700+ Blue Key members going back to 1997.

While UF as a whole has become increasingly diverse, Blue Key has fallen embarrassingly short, becoming overwhelmingly more white and male over time.

As recently as 20 years ago, one of Blue Key’s tapping classes had the same proportion of people of color as the general population. But over the years, Blue Key has failed to mirror the demographic changes, creating systemic gaps in entrance. Not only has FBK not mirrored the student population a single time in the past 20 years, but in 2013 the Blue Key class had fewer people of color than in 1997.

This finding is rather remarkable. One would expect that as more people of color came to UF, more would get into “the most prestigious leadership honorary” at UF. Nothing remotely similar has taken place.

Now of course, racial groups are not a monolith. Are all races equally underrepresented? Below is a graph that looks at the difference, by racial group, between the percent at UF and the percent in Blue Key. Ideally, each should be 0. A line above 0 translates to a over-representation representation and a line below 0 translates to under-representation.

The most unsurprising finding is that by far the most advantaged group on campus is White students, followed by Black, Asian, and Hispanic students. It even appears that Black students are advantaged (even if only slightly) for most years, likely due to the sharply declining population of Black students.

The same result is found when repeating the analysis for women at UF. Women are inexcusably underrepresented, never even passing 45% of Blue Key members even though women comprise the majority of students on campus. More than that, there are three years when women in Blue Key were not even a third of the total membership.

The University of Florida started accepting women in 1947, and it took over a quarter century for FBK to follow. Clearly, this same sentiment is alive and well.

One potential response to this analysis is that Blue Key is not at fault, but these issues are societal and therefore uncontrollable by a single organization. There are two main issues, the first being that it admits that FBK is functionally a willing bystander to institutional racism and sexism at the highest levels of society. The second is that Blue Key’s racial and gender gaps have grown over time, and were less significant in previous years.

Soon after the first analysis was published, the president of Florida Blue Key, Blake Murphy, asked for the report to be taken down, saying “on my end that study, every single stat that is posted is factually incorrect.” Oddly, when asked to clarify about what the correct breakdown was, he replied that FBK does not keep this type of information on its members. If Blue Key does not keep this information, how could he claim that of the stats were incorrect?

Either way, anyone can freely reproduce any of the analyses that I have posted by running and examining the scripts on my GitHub. I have made both the R scripts and the general dataset public. Below is a detailed look at the methodology, if you’re interested.

Methodology:

The majority of the analysis is completed using R (surprise, surprise) and Tableau is used for the visualizations.

Data Collection:

This analysis has confirmed what data scientists already know: 90% of analysis is data collection and cleaning.

Thankfully, Florida Blue Key keeps really good records for who is in each one of their tapping classes going back about a century. Sadly, they only keep a .png file for most years, which is frustrating. I converted the .png to a .pdf file and was able to speed up that process immensely, coming up with this dataset.

Race and Gender Modeling:

A model that gives a probabilistic estimator for an individual’s race is not a new concept, it falls under a skill set often used by political scientists, economists, and demographers called ecological inference. The model that I used comes from a 2016 paper titled “Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Record” by Imai et al out of Princeton. At the most basic level, it takes an individual’s last name, and compares it with census level data to give a probability for each specified race. If we have a specific location, or census block, it has the ability to get increasingly accurate. For example: if your last name is Allison and you live in Gainesville, FL, there is about an 83% chance that you are white, if it is Chao, there is a 90% that you are Asian, etc.

After applying this algorithm to each name from the Blue Key dataset, I used dplyr (thank God for Hadley Wickham) to group by year and sum each of the probabilities to get a probabilistic breakdown of each class.

In a similar fashion, I used the Gender package created by Lincoln Mullen out of George Mason University (his website is full of interesting projects, I would recommend checking him out if you get the chance). This package takes baby name data from the U.S. Social Security Administration and gives us a proportion of genders for each one of the first names in our dataset. In this piece, I assume everyone is about 21 years of age when they get into FBK, which I would argue is the average age of a college junior. I tried the analysis with both 22 and 20 and did not find significantly different results. The SSA method is optimized for individuals born between 2012 and 1880, which is perfect for the given dataset.

This analysis is a more thorough and less naive analysis, but it has its flaws. The most significant one is clearly the lack of data on applicants to FBK. Are the majority of applicants white? Are the applicants of color and female applicants less qualified? Is Blue Key failing to encourage specific groups to complete the application process?

None of these questions can be sufficiently answered unless Blue Key is willing to be transparent about their admissions process. The next step in this analysis is to compare these results with other awards on campus, which will be significantly harder due to small sample size and missing historical data.

If you have any questions or qualms about anything above, reach out to me at tylerjrichards@gmail.com.