I’m the Best Data Scientist You’d Never Hire

Amy Winecoff
5 min readSep 11, 2019

--

Last Friday one of my coworkers asked me to be an interviewee for a mock coding interview. He wanted to test out the new SQL and Python screen he’d developed for our revamped data science hiring process. Although I find coding interviews to be the tech equivalent of the Trier Social Stress Test, I agreed anyway. After all, his goal was to see how a data scientist without a typical background would do. Lately, I haven’t written a lot of SQL queries, and I’ve only used Python intermittently, so I warned him that I might not be much help. Still, he wanted to do a temperature check. The coding problems were admittedly not that complex, but my performance definitely left something to be desired. In the Python interview, I did a passable job; however, I couldn’t remember enough SQL to even start the first question. The shame of bombing the fake interview for my real job hit me hard. How the hell could I fail a technical interview for the job I do every day?

Before I answer that question, let me provide a little bit of context. I’m a damned good data scientist. Though I’m often the first to call out my myriad weaknesses, anyone on my data science team and plenty of people in other divisions, would tell you that I am a serious asset to the company. I’ve done my fair share of typical data science projects — developing and evaluating machine learning models, data munging, visualizations — but I have also developed and enabled data science projects that would not have been possible without me. Instead of giving me assignments to remediate my weaknesses with relational databases or sorting algorithms, my supervisors have generally given me assignments that take advantage of my own specific technical and non-technical skills, skills that most of my data science colleagues do not share.

Drawing from my background in experimental psychology, I recently developed an interactive user testing framework in JavaScript that allows our team to build highly customizable experimental user tests that can be deployed through the Amazon Mechanical Turk (MTurk) API. We’ve used this framework to gather high quality data and answer questions we have never before been able to answer. For each user testing research project, I’ve been in charge of determining the research design, specifying the hypotheses, and determining what statistical tests we should employ. I’ve also written the code for most of the user tasks. This testing framework has become such an instrumental resource that we’re planning on hiring a software engineer to support it.

In other cases, my projects take advantage of my “softer” abilities. In my pre-data science life, I was a professor at a liberal arts college, so I’ve had a ton of experience with teaching and mentoring, public speaking, and written scientific communication. A lot of data professionals would probably not consider these abilities to be technical, but they have nevertheless facilitated some highly technical accomplishments. For example, this year my company’s executives expressed an interest in having the data science team publish research on our technical capabilities. I have a decent publication history for a non-academic, so I was happy to be at the vanguard of our publication directive. I wrote the first scientific paper in the history of the company, which was accepted to a highly selective machine learning conference. Since then, I’ve done a lot of heavy lifting teaching other members of my team (and other technical teams) to write clearly and compellingly about their ideas. In addition to helping my coworkers grow professionally, my guidance on technical writing was instrumental in two additional papers we submitted to other competitive machine learning conferences.

Some people might look at these accomplishments and think, “If you can do all that, why can’t you pass a simple technical interview?” This question belies the problem with much of data science hiring today: It is precisely because I did not spend time developing computer science or theoretical math expertise that I did develop deep and useful expertise in other areas. My background has allowed me to contribute in ways that complement the strengths of my colleagues, most of whom have more or less the same academic training and capabilities. Notably, academic background is not the only characteristic that these colleagues share — most of them are male and most of them are white. Most of them also had a pretty easy time navigating the data science interview gauntlet. If they went back onto the job market today, they would likely have multiple competing job offers.

I do not believe that these colleagues were given a direct advantage in the hiring process; however, I do believe that indirectly, they had a leg up over me and people like me. Many smart people have written about the “pipeline” problems with women, other gender minorities, and people of color in technology, so I will not expound upon that here. I also will not go into detail about how attributing tech’s diversity issues solely to “leaky pipelines” is in and of itself really problematic. But, I will say that just as many tech companies advertise a desire to attract diverse talent, they simultaneously undermine their own diversity efforts by designing hiring processes that require people with different backgrounds to meet a higher standard. Not only must we demonstrate the utility of our unique skillsets, but we also must demonstrate that we have acceptable expertise in the areas in which “traditional” applicants shine brightest. Put bluntly, no matter how good we are at other things, we still have to clear The White Guy Bar.

When I have been actively on the job market, I’ve allocated a lot of my free time and mental energy to ensuring that I can clear or at least come close to clearing The White Guy Bar. While hunting for jobs, I’ve spent countless hours reading data science books and articles, making math flashcards, and doing coding challenges. I’ve memorized how to compute a Fibonacci sequence using recursion and answer questions about the relative merits of particular loss functions. In my day-to-day work, I never need to have this kind of knowledge firmly embedded in my memory. I can usually determine how to solve a code bug with a quick search on Stack Overflow. Likewise, I can figure out the solution to most in-the-weeds machine learning questions by reading the scholarly literature or by asking a colleague. Unlike many other candidates who are underrepresented in tech, when I’ve had a need to do interview cramming, I’ve been able to make the time for it. If I were caring for children or elderly parents, or if I had major responsibilities to my community, or if I had to meet other mental, financial, or emotional demands, I probably would never have cleared The White Guy Bar.

Sometimes I wonder what my company and other companies have missed out on by designing hiring practices that reinforce homogeneity in our talent pools. I also wonder about what opportunities I have missed out on because I failed the initial technical screen in so many job interviews. I won’t ever know the answer to these questions, but I do know that if I had spent my time on the job developing the same skillset my coworkers have, my team would have accomplished a lot less. Tech has a serious diversity problem, and biased hiring practices are a big part of it. I don’t know how to solve these far reaching problems. But I do know that the next time I’m on the hunt for a new opportunity, I’ll need to set aside the extracurricular projects that build on my strengths. Because in the end, I will still have to convince someone somewhere that I’m one of the (white) boys.

--

--

Amy Winecoff

Senior Data Scientist. Interested in machine learning, recommender systems, UX research, and diversity in tech.