Crunching the Numbers on Diversity in Data Science: Events & Resources to Foster Inclusion

SACNAS
STEM and Culture Chronicle
7 min readAug 27, 2020

By Laramie Paxton, PhD

When I was a graduate student in 2017, I was part of the organizing committee for Data Science Day, a small data science conference held at Washington State University. While the conference was a regional success, as I learned more about the lack of diversity in data science (see below), I began to explore how I could organize a larger conference that provided funding for a diverse group of students to attend from across the U.S.

As a result, the Data Science and Image Analysis Conference of the Pacific Northwest was held in cooperation with the Association for Women in Mathematics at Washington State University in Spring 2020. The goal was to bring students from underrepresented groups in STEM together with experts to collaborate on open problems in data science and gain professional development by giving presentations and attending panels and workshops. The conference featured keynote speaker, Emmanuel Yera with Primer AI, who discussed his path to success in Data Science and a recent project on automated image captions.

Dr. Emmanuel Yera

All of the short expert talks on open problems were followed by lengthy audience discussions and brainstorming to provide opportunities for everyone to participate. In support of this, the conference was awarded a National Science Foundation grant for $20,000 to fund travel for women and other underrepresented students in STEM, allowing 40 students nationwide to attend and present their research. It is my hope that this can serve as a model for future STEM conferences, with a strong focus on diversifying STEM, discussing open problems, audience collaboration, and providing professional development for students all in the same event. As the data below show us, there is much more work to be done to increase diversity in data science and in STEM.

If you’re considering a career in STEM, then you’ve probably come across Data Science as a highly touted option for reasons including high job availability and competitive salaries. In fact, the Harvard Business Review called it “the sexiest job of the 21st century.”

While all of this may be true, one thing also remains true: Data Science has a diversity problem. In fact, of all the various tech fields, Data Science currently ranks the lowest in diversity.

For example, while women comprise one-quarter of the employees in the tech industry, which in itself is already quite low, one recent study found that only 15% of data scientists are women. Moreover, the study states that “As data science professionals advance in their careers, the percentage of women decreases significantly. Among the most advanced individual contributors, 6% of data scientists are female; 10% of executive managers are female.” To make matters worse, they are paid $10,000 less per year, on average, for the same work.

This trend continues for other members of underrepresented groups in STEM. Data suggest that fewer than 3% of data scientists are women of color, fewer than 5% are Latinx, fewer than 4% are African-American, and fewer than 0.5% are Native American. In addition, while specific figures are not available for data scientists with disabilities or LGBTQ data scientists, we do know that “There are 20% fewer LGBTQ individuals in government STEM-related jobs than should be expected.”

In order to look ahead and predict how these trends will be in the upcoming group of data scientists, researchers have studied enrollment data at institutions offering courses in data science.

Unfortunately, the data do not indicate a rapid shift in the above figures anytime soon. A Forbes study of five months’ worth of enrollment data from the tech school, General Assembly, supported the fact that Data Science has the lowest diversity of all tech fields: 35% women, 8% Latinx, and 4% African-American, all of which numbers are substantially lower than the averages for their tech courses overall.

Some of the reasons for this lack of diversity have been identified. They include things like…

1. Lack of Early STEM Education: “Black and Latino high school students are being shortchanged in their access to high-level math and science courses that could prepare them for college” (Gewertz, 2016).

2. Lack of Mentorship: “Due to chronic underrepresentation, Native students do not see themselves in STEM careers because they do not see relevant role models. And in STEM careers, the culture and climate often does not support or welcome Native people” (DeerInWater, 2019).

3. Exclusionary Workplace Culture: “On the business side of the problem, companies can make the data field more inviting to women by putting their female employees front and center. On the personal side, women should be encouraged [to] go after these jobs regardless of apparent stigma…” (Phillips, A., 2019).

Meanwhile, diverse companies have been shown over and over to be more productive, better to work for, and more profitable.

“Decades of research by organizational scientists, psychologists, sociologists, economists and demographers show that socially diverse groups (that is, those with a diversity of race, ethnicity, gender and sexual orientation) are more innovative than homogeneous groups” (Philips, L., 2019).

As an example of what can happen without this diversity, we can look to research by Joy Buolamwini on error rates in facial recognition services. One software had an error rate of 0.8% for light-skinned males, but 34.7% for dark-skinned females. Kavita Sangwan (2019), Director of Technical Programs, AI, and Machine Learning at Intuit, explains, “Having greater diversity in the room increases the chance that such inequities will be flagged and addressed so that the resulting products can serve people of all kinds equally well.”

There is one recent and promising study in AI that suggests that while Data Science is a rather new field and will take time to respond to diversity initiatives, some of the efforts to increase diversity in other tech fields may be succeeding.

“The gender gap in artificial intelligence and related fields is slowly but surely closing thanks to multiple existing initiatives that spread awareness and education around the globe. Another important factor decreasing this gender gap is that women are currently recognized for their outstanding achievements within the field, whether they are Business owners, researchers, or whatever position they hold in the industry” (Khizou, 2020).

While the efforts currently underway to close the diversity gap in Data Science are too numerous to list here, there are many recent and upcoming events working towards this goal. Over the past several years, numerous diverse conferences and coding events have been developed, with participation rates rapidly growing. For example, this site lists over 70 Women in Tech events for 2020 alone. The list below includes some of the more well-known events that happen on a regular basis.

Black Engineer of the Year Awards (BEYA) STEM Conference

Black Girls Code Events

Diversity in Tech Panel

Diversity Reboot 2020

Grace Hopper Celebration of Women in Computing

Hispanic Heritage Foundation’s Latinos On Fast Track (LOFT) Coder Summit

Intel’s Native Coders Initiative

LatinX in AI Research at ICML 2020 and at ICLR 2020

NIH Data Science for Science Teachers Bootcamp

Richard Tapia Celebration of Diversity in Computing

Stanford’s Global Women in Data Science Conference

StatFest

Women in Big Data MeetUps

Women in Machine Learning

In addition to the many events aimed at increasing diversity in Data Science and other tech fields, there are many organizations dedicated to providing resources and creating opportunities for members of underrepresented groups to enter the fields of Data Science and Computing. Several prominent ones are listed below for further reference.

AI4All

American Indian Science and Engineering Society (AISES)

Girls Who Code

Hackbright Academy

Harvard’s Women in Computer Science (WiCS) Advocacy Council

Hispanic Heritage Foundation

LatinX in AI Coalition

National Center for Women in IT

Native American Code Writers Program

NetApp’s Network of Blacks in Tech

Progressive Women’s Leadership

Women in Big Data Forum

Women Tech Makers

Looking Forward

Thanks to continued advancements in computational abilities and methods, the field of data science is growing rapidly and occupies a prominent position on the frontier of the mathematical sciences.

As such, many important societal outcomes are dependent on breakthroughs in data science, but a more diverse group of new and upcoming data scientists is needed.

Through inclusive events and resources, such as those given here, and other opportunities, new researchers from diverse backgrounds will be drawn to the field, which will allow for even greater societal advancements in the years to come.

About the Author

Laramie Paxton, PhD is an Assistant Professor of Mathematics and Data Science and the Director of the Data Science Program at Marian University–Wisconsin. He also served as Chair of the organizing committee for the recent Data Science and Image Analysis Conference of the Pacific Northwest.

When he is not busy teaching and researching, he enjoys going for long walks in the woods with his dog.

--

--

SACNAS
STEM and Culture Chronicle

Dedicated to advancing Chicanos/Hispanics and Native Americans in science. Science, culture, and community in the movement for true diversity in STEM.