Beginning with Ourselves
Using Data Science to Improve Diversity at Airbnb
In a recent post, we offered some insights into how we scaled Airbnb’s data science team in the context of hyper-growth. We aspired to build a team that was creative and impactful, and we wanted to develop a lasting, positive culture. Much of that depends on the points articulated in that previous post, however there is another part of the story that deserves its own post — on a topic that has been receiving national attention: diversity.
For us, this challenge came into focus a year ago. We’d had a successful year of hiring in terms of volume, but realized that in our push for growth we were not being as mindful of culture and diversity as we wanted to be. For example, only 10% of our new data scientists were women, which meant that we were both out of sync with our community of guests and hosts, and that the existing female data scientists at Airbnb were quickly becoming outnumbered. This was far from intentional, but that was exactly the problem — our hiring efforts did not emphasize a gender balanced team.
There are, of course, many ways to think about team balance; gender is just one dimension that stood out to us. And there are known structural issues that form a headwind against progress in achieving gender balance (source). So, in a hyper-growth environment where you’re under pressure to build your team, it is easy to recruit and hire a larger proportion of male data scientists.
But this was not the team we wanted to build. Homogeneity brings a narrower range of ideas and gathers momentum toward a vicious cycle, in which it becomes harder to attract and retain talent within a minority group as it becomes increasingly underrepresented. If Airbnb aspires to build a world where people can belong anywhere, we needed to begin with our team.
We worried that some form of unconscious bias had infiltrated our interviews, leading to lower conversion rates for women. But before diving into a solution, we decided to treat this like any problem we work on — begin with research, identify an opportunity, experiment with a solution, and iterate.
Over the year since, the results have been dramatic: 47% of hires were women, doubling the overall ratio of female data scientists on our team from 15% to 30%. The effect this has had on our culture is clear — in a recent internal survey, our team was found to have the highest average employee satisfaction in the company. In addition, 100% of women on our team indicated that they expect to still be here a year from now and felt like they belonged at Airbnb.
Our work is by no means done. There’s still more to learn and other dimensions of diversity to improve, but we feel good enough about our progress to share some insights. We hope that teams at other companies can adopt similar approaches and build a more balanced industry of data scientists.
Addressing the top-of-funnel
When we analyze the experience of a guest or host on Airbnb, we break it into two parts: the top-of-funnel (are there enough guests looking for places to stay and enough hosts with available rooms) and conversion (did we find the right match and did it result in a booking). Analyzing recruiting experiences is quite similar.
And, like any project, our first task was to clean our data. We used the EEOC reporting in Greenhouse (our recruiting tool) to better understand the diversity of our applicants, doing our own internal audit of data quality as well. One issue we faced is that while Greenhouse collects diversity data on applicants who apply directly through the Airbnb jobs page, it does not collect information on the demographics of referrals (candidates who were recommended for the job by current Airbnb employees), which represent a large fraction of hires. Then we combined this with data from an internal audit of our teams history and from Workday, our HR tool, in order to compare the composition of applicants to the composition of our team.
When we dug in, we found that historically about 30% of our applicants — the top of the funnel — had been women. This told us that there were opportunities for improvement on both fronts. Our proportion of female applicants was twice that of employees, so there was clearly room for improvement in our hiring process — the conversion portion. However, there wasn’t male/female parity in our applicant pool so this could also prove a meaningful lever.
In addition, we wanted to ensure that our efforts to diversify our data science team didn’t end with us. Making changes to the top of the funnel — to how many women want to and feel qualified to apply for data science jobs — could help us do that. Our end goal is to create a world where there is diversity across the entire data science field, not just at Airbnb.
We decided that the best way to achieve these goals would be to look beyond our own applicants to inspire and support women in the broader field. One observation was that while there were a multitude of meetups for women who code, and many great communities of women in engineering, we hadn’t seen the same proliferation of events for women in data science.
We decided to create a series of lightning talks featuring women in data, under the umbrella of the broader Airbnb “Taking Flight” initiative. The goals were twofold: to showcase the many contributions of women in the field, and to create a forum for celebrating the contributions of women to data science. At the same time, we wanted to highlight diversity on multiple dimensions. For each lightning talk, we created a panel of women from many different racial and ethnic backgrounds, practicing different types of data science. The talks were open to anyone who supported women in data science.
We came up with the title “Small Talks, Big Data” and started with an event in November 2014 where we served food and created a space and time for mingling. The event sold out, with over 100 RSVPs. Afterward we ran a survey to see what our attendees thought we could improve in subsequent events and turned “Small Talks, Big Data” into a series, all of which have continued to sell out. Given this level of interest, several of the women on our team volunteered to write blog posts about their accomplishments (for example, Lisa’s analysis of NPS and Ariana’s overview of machine learning) in order to circulate their stories beyond San Francisco, and to give talks and interviews (for example, Get to know Data Science Panelist Elena Grewal). Many applicants to our team have cited these talks and posts as inspirations to consider working at Airbnb.
In parallel to these large community events we put together smaller get-together for senior women in the field to meet, support one another, and share best practices. We hosted an initial dinner at Airbnb and were amazed at what wonderful conversations and friendships were sparked by the event. This group has continued to meet informally, with women from other companies taking the lead on hosting events at their companies, further exposing this group to the opportunities in the field.
Alongside our efforts to broaden our applicant pool, we scrutinized our approach to interviewing. As with any conversion funnel, we broke our process down into discrete steps, allowing us to isolate where the drop-off was occurring.
There are essentially three stages to interviewing for a data science role at Airbnb: a take-home challenge used to assess technicality and attention to detail; an onsite presentation demonstrating communication and analytical rigor; and a set of 1:1 conversations with future colleagues where we evaluate compatibility with our culture and fit for the role itself. Conversion in the third step was relatively equal, but quite different in steps one and two.
We wanted to keep unconscious bias from affecting our grading of take-home challenges, either relating to reviewers being swayed by the name and background of the candidate (via access to their resume) or to subjective views of what constitutes success. To combat this, we removed access to candidate names and implemented a binary scoring system for the challenge, tracking whether candidates did or did not do certain tasks, in an effort to make ratings clearer and more objective. We provided graders with a detailed description of what to look for and how to score, and trained them on past challenges before allowing them to grade candidates in flight. The same challenge would circulate through multiple graders to ensure consistency.
Our hypothesis for the onsite presentation was that we had created an environment that catered more to men. Often, a candidate would be escorted into a room where there would be a panel of mostly male data scientists who would scrutinize their approach to solving the onsite challenge. The most common critique of unsuccessful candidates was that they were ‘too junior’, stemming from poor communication or a lack of confidence. Our assumption was that this perception was skewed by the fact that they were either nervous or intimidated by the presentation atmosphere we had created.
A few simple changes materially improved this experience. We made it a point to ensure women made up at least half of the interview panel for female candidates. We also began scheduling an informal coffee chat for the candidate and a member of the panel before the presentation, so they would have a familiar face in the room (we did this for both male and female candidates and both said they appreciated this change). And, in our roundup discussions following the presentation, we would focus the conversation on objective traits of the presentation rather than subjective interpretations of overall success.
Taken together, these efforts had a dramatic effect on conversion rates. While our top-of-funnel initiatives increased the relative volume of female candidates, our interviewing initiatives helped create an environment in which female candidates would be just as likely to succeed as any male candidate. Furthermore, these changes to our process didn’t just help with diversity; they improved the candidate experience and effectiveness of hiring data scientists in general.
Why this is important
The steps we took over the last year grew the gender balance on our team from 15% to 30%, which has made our team stronger and our work more impactful. How?
First, it makes us smarter (source) by allowing for divergent voices, opinions, and ideas to emerge. As Airbnb scales, it has access to more data and increasingly relies upon the data science team’s creativity and sophistication for making strategic decisions about our future. If we were to maintain a homogenous team, we would continue to rely upon the same approaches to the challenges we face: investing in the diversity of data scientists is an investment in the diversity of perspectives and ideas that will help us jump from local to global maxima. Airbnb is a global company and people from a multitude of backgrounds use Airbnb. We can be smarter about how we understand that data when our team better reflects the different backgrounds of our guests and hosts.
Second, a diverse team allows us to better connect our insights with the company. The impact of a data science team is dependent upon its ability to influence the adoption of its recommendations. It is common for new members of the field to assume that statistical significance speaks for itself; however, colleagues in other fields tend to assume the statistical voodoo of a data scientist’s work is valid and instead focus on the way their ideas are conveyed. Our impact is therefore limited by our ability to connect with our colleagues and convince them of the potential our recommendations hold. Indeed, the pairing of personalities between data scientists and partners is often more impactful than the pairing of skillsets, especially at the leadership level. Increasing diversity is an investment in our ability to influence a broader set of our company’s leadership.
Finally, and perhaps most importantly, increasing our team’s diversity has improved our culture. The women on the data science team feel that they belong and that their careers can grow at Airbnb. As a result, they are more likely to stay with the company and are more invested in helping to build this team, referring people in their networks for open roles. We are not done, but we have reversed course from a vicious to virtuous cycle. Additionally, the results aren’t just restricted to women — the culture of the team as a whole has improved significantly over past years; in our annual internal survey, the data science team scores the highest in employee satisfaction across the company.
Of course, gender is only one dimension of diversity that we aim to balance within the team. In 2015 it was our starting point. As we look to 2016 and beyond, we will use this playbook to enhance diversity in other respects, and we expect this will strengthen our team, our culture, and our company.
 We ended up discontinuing this after a couple months after running into logistical issues with using Greenhouse. Greenhouse does not allow us to remove names, so when we switched to using Greenhouse fully to track take home results, graders were able to see names when they logged in to give scores.