GitHub.com Demographics: A story of researching & uncovering blind spots
With every user research study we conduct at GitHub, our team works hard to include a diverse range of participants. It is well known that there are far fewer women in the developer community, so recruiting evenly for studies presents a challenge. This is especially true when our research efforts are focused on smaller communities like those groups who have git workflows with large file storage needs.
When I joined GitHub three years ago to lead our user research program, I faced a number of challenges, recruiting for studies being close to the top. We don’t ask for demographic data upon account creation, so you have to do a lot of manual culling and guesswork in order to successfully source a diverse group.
One of the channels I’ve used for sourcing women participants to be part of qualitative research studies is @paulmillr’s “Most Active GitHub Users” list. While there are problems using this list for recruiting, initially it helped us to find, connect with, and deliberately include women in our research studies. Things were going well enough until two years ago when I received a fascinating response to one such outreach effort:
My goal had been to ensure we were including women in our work, but this particular community member asked a more thoughtful question that our team wanted to find an answer to, but couldn’t using qualitative inquiry alone:
Why is gender relevant on GitHub.com at all?
What follows is the story about how our research team embarked on a six month study of the personal demographics of GitHub’s community. We took on this endeavor with great humility as it required designing for, seeking out, and measuring responses from three different types of users: tenured-active, newly-active, and inactive/dormant.
Each line of inquiry revealed a blind spot, which motivated us to recruit from hard-to-reach places like inactive users, so we could depict a more complete picture of our community.
Why does gender, race/ethnicity, human age, or experience with formal CS education matter at all with a product like GitHub? As a research team, aren’t we concerned with a set of KPIs like # commits, pull requests, issues, etc.? Indeed, we are! However, GitHub has the unique responsibility of being both a private product used by individuals and companies, as well as the largest open source community for software developers.
The GitHub Profile
Product communities and for that matter the world around us are increasingly characterized by growing diversity– our social and professional realities are increasingly shaped by race, class, gender, human age, and other categories of personal experience. Both experienced developers and people new to programming use GitHub to establish both a social and professional profile, organize code, share feedback, and work on projects with others.
It’s important that GitHub cultivates a space representative of and accessible to both experienced and aspiring community members. In the company’s history, we had never tried to empirically study community demographics, so there wasn’t an established baseline. We had to educate our peers on methods and challenge ourselves to begin a research effort to thoughtfully and responsibly collect demographic data.
tl;dr: Similar to StackOverflow’s 2015 Developer Survey, active users on GitHub.com are mostly (68%) white, and overwhelmingly (93%) male. However, we find it exciting that GitHub’s newest signups are increasingly diverse, and the data suggests that the retention rate is similar across demographic groups.Designing to collect demographics
Six months ago (August 2015), our research team began collecting optional demographic information in our surveys of various segments of our users. Our aim was to begin to examine GitHub’s growing community and how these categories of experience shape related experiences with the product’s social world and explain user behavior (both with tools and with other humans). We’ve boiled down our learnings by these three studies and will share the path we took as we discovered blind spots.
Designing to collect demographics
For context, we asked five demographic questions at the end of three surveys of three different user groups.
Note: All GitHub surveys are optional and may be exited at any time, but we placed an additional reminder before the demographic questions to make it clear that no user was compelled to share this personal information:
Three surveys, three types of users
We sampled three different types of users in our community to build a more complete picture of GitHub.com’s user base. While none of these surveys were designed specifically with a focused charter to measure demographics, by including the same optional questions on each we found some interesting, important, and consistent themes looking at: sex, race/ethnicity, human age, formal computer science education, and proficiency with English.
We had very high response rates to the same set of demographic questions across the three individual survey projects:
- Annual Tools & Workflows — Survey of tenured (3 years) active users; <5% of respondents opted out .
- New Account Creators (NAC)— Survey(s) of newly-joined users with ~90 days of experience (two groups: creators who had taken maker actions on the site and explorers who viewed pages on the site); <5% of respondents opted out.
- GitHub 365— Survey of new users with ~365 days, who had been inactive for three months prior to our outreach; 16% opted out.
Study 1. Annual Tools & Workflows Survey
Active tenured users
The Tools & Workflows study reached a representative sample of users who visited GitHub.com, whether logged-in or logged-out, during the three week study period in August 2015.
Active-tenured users represent an important segment of our community, however they are a group that skews towards established users who joined GitHub in its early days. There is a significant ramp-up period between sign-up and regular activity, so GitHub’s active-tenured users don’t provide a complete picture of the community.
In our quantitative and qualitative studies, we’ve learned that many people who sign up leave the site for some time, and only come back once they’ve had a chance to learn git, learn GitHub, establish a workflow, and have a project that requires their attention.
When we looked at the distribution of account ages represented in our respondents and compared it to the database, we saw that we indeed had a big blind spot: new users.
This led us to our next project the, New Account Creators Longitudinal Study, where we deliberately oversampled new users, asking them the same Tools & Workflows and demographic questions. Instead of being delivered as a single large survey as we had done with our initial effort, we carefully divided the same questions up into a series of shorter surveys that were more accessible to newly-joined users.
Study 2. New Account Creators (NAC)
Newly created accounts
Like tenured-active users, newly-joined users are an important part of GitHub’s growing community. As you can read in the table, the majority of newly-joined respondents self-identified as white. However, what’s striking is that the majority of newly-joined users is less than active users (59% vs. 68%).
New users are an important part of GitHub’s growing community. The majority of respondents self-identified as white. However, what’s striking is that the majority of new users is less than active users (59% vs. 68%).
When we examine U.S. respondents only, we note that new users are as likely to self-identify as white as are their tenured counterparts (70%).
New users are slightly less likely than active users to report studying computer science formally (57% vs 64%).
However, yet again, when we looked closely at this sample, we saw that we had another big blind spot: inactive/dormant new users.
Study 3. The GitHub “365”
Inactive new users
This led us to our next project, The GitHub 365, where we deliberately oversampled inactive new users. The survey’s focus was on why they stopped using GitHub, what other tools they were using, as well as the same demographic questions.
This third survey sampled user accounts that had been created between six months to one year old and that had been inactive for at least 90 days to qualify for recruitment. Unlike our other efforts where we promote surveys to a randomized sample on GitHub from within the application, this survey was delivered via email, since these users are not active on the site.
Strikingly, we found that the demographic breakdown of inactive year-old users is nearly identical to that of the new users surveyed in the preceding New Account Creator survey:
- 12% of the one year inactive group self-identified as female.
- 60% identify as white or European.
- 58% reported formal computer science training.
- The median age is 31, so this group is slightly older in human age than active users on GitHub.com and the new account creators.
The GitHub 365 specifically looks at people who signed up, but did not return (range of temporarily dormant to completely inactive accounts). Within the first year of GitHub.com account creation, the rate of attrition appears to be the same across most groups, which suggests that the makeup of people joining GitHub is indeed changing. This is exciting news, but we never would have figured it out if we hadn’t persevered to learn from three different types of users on and off the site.
This data and analyses are offered as our team’s initial contribution towards what should develop into a much broader study of communities on GitHub. There is still a lot of work to do in understanding the experiences of GitHub’s community.