Race and Gender Among Computer Science Concentrators at Harvard


Last week, one of my friends who is working on a publication asked me to write a short piece about what it’s like to be an Asian American woman in the tech industry. I scratched my head for a few days trying to decide what to write about, what kinds of experiences would resonate the most within my demographic, and realized that a lot of women I knew in tech were indeed Asian American. But then, I also have a lot of Asian American friends in general, so I wasn’t sure if my mental image of women in tech was influenced by that bias. So, obviously the logical next step was to pull up the Harvard Facebook, search through all the Computer Science concentrators and collect some data about the demographics of race and gender within the Computer Science concentration at Harvard :P

Jesting aside, this cataloguing actually turned out some pretty interesting results. Before I dive into my findings, I’ll forewarn you that I haven’t taken a single statistics class in my life and though data science interests me a lot, I largely have no idea what I am doing. So, please bear with me and help me improve my analysis and possible further research if you have suggestions! I’m always looking to learn more :) Luckily, one of my summer housemates is a statistics major and a product science intern at Medium, so he helped me out and convinced me that my findings were worth publishing.

Data Collection

I was primarily interested in how race and gender was distributed among Computer Science students at Harvard, by themselves and also in relation to each other. In particular, I wanted to see if the Asian American female demographic was indeed as skewed as I imagined it to be. So, I figured if I was going to look into that specific intersection, I might as well also look into the others and have more to compare.

As a quick intro, the Harvard Facebook is an private online directory (accessible only with a Harvard login) that lists undergraduate students, able to be filtered by their house, class year, and concentration. Incidentally, it’s also where Zuckerberg gained inspiration for the name of his somewhat more popular online directory. A concentration is basically a college major, and students generally choose their concentrations halfway through their sophomore year (but some eager students submit their declarations before that). Because of the latter caveat, when I filtered the directory to only display CS concentrators, only 225 results showed up — 114 rising seniors, 107 rising juniors, and 4 rising sophomores (these are the kids that declared early). Anyone in the Class of 2015 has already been wiped from the Facebook, since they’re no longer students. Because of this, my sample size is a little smaller than it should be, and I hope to add more to my results at the end of the fall when the new class of CS sophomores have unveiled themselves.

Since I was interested primarily in demographics involving race and gender, and neither of those things are stored on the Facebook, I had to index everyone myself. Luckily, since I am one of these students and have interacted with a large number of them in and between classes at school, I was able to do this relatively painlessly. The directory also includes people’s ID photos by default (though some choose to remove it later), so I was able to quickly sort people based on their appearance and name. First, I split the results up by gender — I made a couple of mistakes on my first pass due to unisex names, but the ID photo really saved the day (this is not to say that one’s appearance or name necessarily always matches one’s gender identity, but for the sake of this study that’s the method I used).

Sorting everyone by race was harder, due to the arbitrariness of race categories and the fact that they can be extremely blurred. I ultimately decided to sort everyone into the categories White, Asian, Hispanic and Black since that’s what I’ve seen a lot of other demographic studies use (#bandwagon). I also further broke down the Asian category into East and South Asian, since for my personal purposes I was interested in Asian Americans in tech in particular, and wondered if any significant trends would appear. I used a Surname Origin Tool to help me out. Obviously, this isn’t the most perfect way to do this since family name origins can become separated from race, but given my constraints it was the best I could do. ID photos also supplemented this, obviously. I counted Middle Eastern students (family name origins west of Iran, inclusive) as White, and I counted South East Asians (Vietnamese in particular) as East Asian — there were so few that I didn’t think it would make sense to put them in a separate category (which really speaks to the lack of diversity in tech right now than anything else). In the case of mixed-race students (which I knew about either from their double-barreled last names or through personal connection), I counted them as decimal points of races (eg. my half-white, half-asian friend would be counted as 0.5 White and 0.5 Asian).

If you are a Harvard CS concentrator and are reading this and think I may have gotten your race or gender wrong based on how I determined it, please don’t hesitate to reach out so I can make my data more accurate!

Findings

Here is a quick breakdown of my findings, including charts I made with Excel from the data:

OVERALL
Looking at gender separately, we find that (as expected) there are more male CS concentrators than female ones. The specific ratio is 164 to 60, or 73% male vs 27% female.

Looking at race, we find the majority of Harvard CS concentrators are Asian (53%), followed by White (39%), Hispanic (5%) and Blacks (3%). Within the Asian super-category, there are many more East Asians (40%) represented than South Asians (13%). Okay, so this is still pretty expected — we already knew that Silicon Valley is dominated by White and Asian men, but it’s interesting to see that at Harvard, the number of East Asian and White CS concentrators is pretty much the same, and South Asians are not as big of a group.

WITHIN RACE
This is when things start to get interesting. I found that within Asians, the gender ratio was better than the overall average and in the other three groups, it was poorer. I was also startled to find that there were no Hispanic women in my dataset. This is probably a result of a small sample size and obviously not a representation of CS students nation-wide, but still — a find worthy of at least a few raised eyebrows.

If we’re talking about specific numbers, I found that East Asians had the most equal gender ratio, with 36% of all East Asians in the set being female, followed by 32% of South Asians, 19% of Whites, 15% of Blacks, and 0% of Hispanics.

WITHIN GENDER
If we look at race within each gender, we see that the number of Asian and White males is roughly the same, with 43% of all males being White, 47% Asian (35% East Asian and 12% South Asian), 7% Hispanic and 3% Black.

This also reveals that a huge contributor to the almost equal numbers of East Asians and Whites overall is the number of Asian women in the concentration. Of all females concentrating in CS, a staggering 69% is Asian (consisting of an equally staggering 54% East Asian and 15% South Asian), 28% White and 2% Black.

Based on that last find? … Yeah, I think I answered my question.

Conclusion

Basically, I proved my suspicions with the data I found — there is indeed a disproportionate number of Asian women in tech at Harvard. In particular, a highly disproportionate number of East Asian women. For reference, Harvard College is roughly 50:50 male:female (though if we want to be really specific, the last few years the ratio has been skewed slightly toward more males). In terms of race diversity, it is roughly 60% White, 30% Asian (with roughly 5% identifying as Indian), 10% Hispanic and 10% Black (don’t ask me why those numbers don’t add to 100%, I just got them from the Crimson survey — I guess people identified as mixed race and now everyone’s confused).

Anyway, I hope that you all enjoyed my informal study and learned something interesting! Diversity is on everyone’s minds right now, and I‘ll bet that the breakdowns at most tech companies is far worse than they are at Harvard. I definitely think I’ve only scratched the surface of this issue; I’d love to see breakdowns from other colleges about this because I think it would be really interesting to compare across campuses to see if we can find a larger trend. I also think a lot more analysis could be done to explain what these trends mean (I’m not going to pretend to be a sociologist now, I’m hardly qualified to do this type of data collecting as is!), for instance why the gender ratio appears to be better within the Asian group as a super-category despite East and South Asian statistics differing along other axes.

As for writing about my own experiences as an Asian American woman in tech, I initially suspected (and subsequently confirmed) that being in this group actually comes with some form of majority privilege… but that story’s for another day!


If this study interested you, also see Jorge Cueto’s similar study on demographics at Stanford! Also, if you are planning to do one of these for your school, feel free to use the hashtag #CSAtMySchool on Facebook and Twitter, and tweet at me @TheWinnieWu so I can link your article from this post :)