Google May be Guilty of Gender Discrimination But Is Definitely Guilty of Bad Data Analysis
In case you haven’t heard, Google is the target of a class-action lawsuit regarding gender discrimination. (Shocking, I know, given what we know about Silicon Valley more generally.) Part of the impetus for the lawsuit is an employee-led effort to collect compensation data that shows that men are paid more than women at the company. Interestingly, however, this data in itself doesn’t say a whole lot about whether discrimination exists. (Don’t stop reading, I’m not on team Google here, just on team “using math properly.”)
From a data perspective, proving discrimination can be somewhat difficult. For example, we hear the often-quoted “women make 77 cents for every dollar a man makes” statistic, but this number doesn’t really tell us much, since it could very well be the case that women sort into lower-paying occupations and jobs of their own volition, choose to work fewer hours, and so on. (On the other hand, we can’t rule out the discrimination hypothesis either using this number, and the reality is somewhere in the middle.) Ideally, what one would do to look for discrimination would be to compare otherwise equivalent men and women and see whether compensation differences still exist within the matched groups. Mathematically, this is essentially what economists do when they run a regression with “control variables”- variables that suck up the pay differences that are accounted for by stuff other than gender in order to estimate an “all else being equal” type of effect- in this case, the effect of being female.
Google employees seem to be up on their applied math, since they put together an analysis so that they could make the following statement:
Based upon its own analysis from January, Google said female employees make 99.7 cents for every dollar a man makes, accounting for factors like location, tenure, job role, level and performance.
On the surface, this seems to suggest that significant gender discrimination just doesn’t show up in the data. BUT…and this is important…this example highlights the difference between doing math and doing data analysis (or, more charitably, data science)- while this conclusion may be mathematically correct, it’s basically a “garbage in, garbage out” use of econometric tools. Simply put, if you’re trying to isolate gender discrimination, you can’t just blindly control for things that themselves are likely the result of gender discrimination! It’d be like looking at the impact of diet on health and using weight as a control variable- sure, you’d get an “all else being equal” sort of result, but it wouldn’t make sense since weight is likely a step in the chain between diet and health outcomes. (In other words, the analysis would estimate the effect of a particular diet compared to a person who didn’t follow the diet but ended up weighting the same as the person who did, which is probably not the comparison that you want to make.)
If you don’t believe me, perhaps a labor economist and an econometrics text will convince you:
In this way, Google tipped its hand quite a bit regarding the particular nature of gender discrimination at the company- if men and women are paid the same once job title and performance reviews are taken into account, then gender discrimination (if it exists) is taking place either by herding women into jobs with different roles/levels or showing anti-female (or pro-male) bias in performance reviews. (Also, if the “levels” have set pay bands, which the source article kind of suggests, controlling for level largely amounts to assuming the conclusion.)
Turns out my suspicions are pretty on point, given the specific claim of the lawsuit. It’s amazing what you can learn from data IF you look at it properly. In a semi-previous life, I worked as an economic consultant, which basically means that I helped prepare expert testimony to be used in lawsuits involving economic matters. What I wouldn’t give to be the expert witness who gets to offer up a rebuttal to Google’s crap econometrics here.