Last year we took our annual data science survey to the next level by turning over the results to YOU through an open-ended Kernel competition.
We were overwhelmed by the response and quality of kernels submitted. Not only are Kagglers amazing data scientists, but they’re incredible storytellers as well!
Martin Barron was one of those skillful enough to take our data and shape it into something meaningful — not just for Kaggle, but for the data science community at large. We hope you enjoy getting to know him as much as we did.
Congrats, Martin on your win!
To take a look at Martin’s winning Kernel, visit: The Gender Divide in Data Science
Martin, what can you share about your background?
I am an associate director at the University of Chicago’s Urban Labs, where we work with civic and community leaders to identify promising social programs and public policies. In my current position, I manage a team of 15 talented analysts and data scientists, who do the important work of rigorously evaluating those programs to ensure they are effective and efficient.
Prior to my current position, I worked for a large survey organization doing work quite similar to this challenge. My previous job often involved examining raw output from surveys, extracting key insights from the data, and constructing a coherent narrative from those insights.
What made you decide to enter this admittedly unconventional challenge?
In all honesty, insomnia.
I woke up early one night and stumbled upon the competition while looking for something to keep me occupied. After reading the description, I was immediately attracted to the idea of using the dataset to investigate gender differences in the data science field. It’s a topic I care a lot about, and the Kaggle dataset seemed to present a fairly unique opportunity to investigate the topic.
I also, frankly, really was attracted to the opportunity to once again “get my hands dirty” with some survey data analysis. My current position is largely managerial, and when I do get the opportunity to perform some analysis, it tends to be on much more limited administrative datasets.
Were any methods particularly helpful in doing your analysis?
This competition was, obviously, quite different from other Kaggle challenges because it did not require any machine learning. (Indeed, the fact that that competition didn’t require machine learning is another reason I decided to enter, as it meant I had a chance of placing!)
Although the survey collector removed some spam responses, I noticed that there were other entries I felt warranted deletion. I ultimately removed additional entries where more than 80 percent of questions were unanswered or where respondents spent fewer than 5 minutes answering the questions. Although this resulted in dropping almost 7,000 respondents, I felt the results would be stronger if these (likely) junk responses were removed.
What was your most important insight into the data?
My early drafts were much longer and used many more of the survey questions than my ultimate submission. They were also a lot more boring. So probably the most important insight I had was that there was a coherent story to be told just highlighting a few key points.
Were you surprised by any of your insights?
I know I shouldn’t have been surprised, but I nevertheless was surprised to see the gender differences in reported salaries. It’s one thing to hear that that the median salary for women is less than that of men; it’s another thing to actually calculate it on data in your hands and see women earning 86 percent of what men earn.
Which tools did you use?
All of my analysis for this project was conducted in R. After some initial exploratory analysis, I worked exclusively in R Markdown using R-Studio.
What have you taken away from this competition?
My biggest takeaway is that we, as a discipline, need to do more. As I say in my entry, “Ours is a young discipline. Let us fight now to make it a just and equitable profession not only for its current practitioners, but all of those who are to follow.” One small thing that I’ll be doing is making a donation to two organizations, CoderSpace and App Camp for Girls, working to make computer science (and thus, by extension, data science) more inclusive. They are really great groups that I’d encourage others to support.
Martin Barron is the Associate Director of Data and Analysis in Crime and Education Labs at the University of Chicago Urban Labs. Urban Labs works closely with civic and community leaders to identify, test, and help scale the programs and policies with the greatest potential to improve human lives. Martin received his Ph.D. in Sociology from SUNY Stony Brook. His current research focus is on quality assurance in analysis and data sciences.