Data science career advice

The data science hiring process is broken

10 min readFeb 5, 2018

Be honest: this is what it sometimes feel like.

A few days ago, a former colleague who’s about to take on some hiring responsibilities asked to borrow some tips from an exchange he and I once had about hiring technical talent. I said, “Of course — use it however you like. I’m thinking about writing it up for a blog post as well.” Then I dug that conversation out of my Slack archive. It was from several months ago. The last line read “Ok, that’s all I have. I might turn this into a blog post.” So apparently I really do need to write about this. Here goes.

I’ve changed jobs several times over the years, so I’ve gotten a lot of experience applying and interviewing for data scientist jobs. I’ve also hired data scientists, sometimes as one individual contributor on a team and sometimes as the manager, so I’ve thought a lot about how to evaluate data scientist candidates. I’ve come to two conclusions: the process in most (of course not all) companies use is really broken, and a non-broken (of course not perfect) process is actually pretty simple.

An example of a relatively non-broken process

If you find yourself in the position to hire analytic talent, I suggest the following:

Step 1: HR screen. You’ll be wasting your time if you do the first screen — there are a lot of people out there who just are qualified for the job. Let HR catch those people. That being said, HR should come back to you with a detailed reason for each person they want to reject, and they shouldn’t reject until you give the go-ahead. This helps you calibrate HR to get you better people.
Step 2: Phone screen with you (the hiring manager). The explicit purpose of this step should be for the candidate to ask questions about the job so they can decide whether they’re interested. They should know that purpose long before they get on the phone with you. You shouldn’t be doing any evaluating at this point. This is for them to become educated about the role, the team, and the company. This step reduces the power asymmetry that naturally exists in any candidate evaluation process. Let the candidate evaluate you, and you’ll find you learn a whole lot about the candidate.
Step 3: Technical interview. Provide the candidate with a list of 1–4 types of things you’ll want them to actually do on the job. They should choose one of those and come prepared with code that they have written for a prior job or project that does that sort of thing. If they can’t come up with a single written-for-the-real-world code sample that illustrates any of the activities you wanted to hear about, that tells you something about their fit for the role. Assuming they have a code sample that works, do a code review with them: have the describe the business purpose of the code, walk you through the main structure/functionality, and explain what they would do differently if they had to do it over again. Maybe someone has worked their entire career in places where all their code is kept in private repositories. (That is, in fact, exactly what I’ve done.) Have they engaged in side projects? Abstracted code from work and gotten approval to post about it on a blog? Compiled an approved private collection of sanitized code from previous production environments that can be showcased on demand? Then you have a basis for this code review.
Step 4: Team fit interview. Get a bunch of people from the team — all of them if the team is small enough — and have lunch with the person. Don’t go to this step unless you intend to hire them. The purpose here is not to evaluate whether they will be able to do the job — that was the purpose of the prior steps, including reviewing their resume. The purpose here is to see if they’re a person that everyone wants to work with.

At the end of the process, you’ll still feel like you don’t have enough information to make the decision. That’s not because you’ve left out a step or have missed an important evaluation technique. It’s because you’re a human evaluating another human. Uncertainty comes with the territory. Deal with it.

Technical challenges are too broken to be salvaged

I really dislike technical challenges, particularly live-coding (including whiteboarding) and brain teasers. The ease with which I can find complaints about the hiring pipeline for technical roles (for a few examples: here, here, here, here, here, and here) suggests that I’m not the only person who has been frustrated by these kinds of evaluations. I’m find these kinds of evaluations always fall into at least one of the following categories:

They don’t align with the actual job responsibilities. For example, one challenges I was given in an actual interview: “There’s an island with 100 lions and one sheep. If a lion eats the sheep it will turn into the sheep. The lions are lined up so only one has the opportunity to eat the sheep at once. Will it eat the sheep?” I’ve never experienced a technical evaluation where I could could clearly figure out how my ability to answer the question had much at all to do with what I would be expected to actually do on the job. To the rejoinder that a question like the one above can “help the interviewer see how you think”, I recommend reading up on the extreme difficulty of assessing intelligence with tools that make use of unfamiliar imagery and situations.
They pretend to measure competence when they really don’t. I was once asked: “Given a function which generates the integers from 1 to 5, write a function that will generate the numbers from 1 to 7.” I wrote the questions down. After the interview, I googled it and found the answer, first hit. If you can fully figure out a problem through five seconds of googling, then being able to answer that problem through zero seconds of googling isn’t a measure of competence.
They filter for people who actually aren’t the people you most want to hire. A technical recruiter once told me about a job at a large company that he was having trouble filling: “The interview process basically encourages potential applicants to research how to “pass” their structured interview process as opposed to genuinely evaluate candidates on an individual basis. …you’re only hiring people who already want to work for you not the best people who might not want to work for you and have no interest in following your process just to get in. Then you’re no longer attracting talent, you’re just processing the applications of those who are already interested.”

For data science positions, the technical challenge often comes in the form of a take-home assignment. I used to use these myself. I don’t recommend them anymore. At best, they simply make an already painful process just little more painful in exchange for a few extra data points about a candidate. At worst, they show a fundamental disrespect for the candidate’s time. The typical challenge requires at least four hours of work. For half a work day, you should be given an extremely clear understanding of why each section of the challenge was included, and how your performance on the challenge is going to be used to evaluate you. Teams who use take-home data challenges should have that clear reason for including every component of the challenge, and clear rules how performance on each of those sections will be incorporated into the information from the rest of the interview process. I don’t think most teams are that deliberate. For teams and companies who care about employee experience, the opaque nature of take-home challenges should be very concerning.

Look at it this way: what if you were a manager and your boss came to you and said that because of a shortfall, you needed to fire someone from your team, but you’d be able to take some of the cost savings from that person’s salary and give someone a modest promotion. So you have two decisions: termination and promotion. In what world would you make either of those decisions based on bringing all your team members into a room and telling them to live-code a task on which they have no business context, no preparation, and no realistic timeframe for completion? In what world would you feel more justified making that decision just because you gave them a more extended timeframe but retained the lack of context and preparation? It’s absurd. If you wouldn’t make a termination or a promotion decision that way, don’t make a hiring decision that way.

The obvious rejoinder to this position is that you have loads of information about a person when faced with a termination or promotion decision because you’ve worked with that person. What’s wrong with using a technical challenge as a model of the missing information? What’s wrong is that the model is an extremely poor fit. All models are wrong but some are so wrong as to no longer be useful. At that point, the solution is not to stick with the mad model simply because something is better than nothing, but rather to find another basis for making your decision. I believe that this, exactly, is the case with using technical challenges to evaluate candidates. So what is the alternative basis for making the decision? There are two approaches, not mutually exclusive:

On the one hand, guard against charlatans — people who pose as having the skills you want but don’t actually have them. In these cases, you need a test for “skin in the game” — proof that they have sacrificed time, resources, or something else valuable in order to demonstrate their competency. That’s why I recommend having people bring in their own code. If they can’t find anything viable for the interview, then you’re dealing with a candidate who has never sacrificed the time and effort to make sure they can demonstrate their competency.

On the other hand, make your team more resilient: if you have a way quickly mentor and up-skill a person who doesn’t perform at the level you need, then you protect yourself against the risk that you’ve hired an insufficiently skilled person. Obviously, if you have a must-have skill — say you need the person to be able to deploy models over Spark on day 1, then you need to filter for that, but even that kind of requirement doesn’t necessitate a technical challenge. That just requires a resume review and a probing conversation.

For job seekers, some advice on how to not let the broken process break you

If you’re looking for a technical role: stay away from companies that don’t have a human-centric hiring process. Over time, I’ve become so skeptical of technical challenges I actually refuse to do them when I’m interviewing for a job. When an interviewer tells me the next portion of my interview will be a coding exercise, I’ve say or write something like this:

I’ve decided against participating in[insert technical challenge type here]. I’ve found too great a disconnect between the skills those interviews assess and the skills data scientists use every day on the job. Naturally, the team may feel strongly the other way and are entirely at liberty to evaluate candidates however they want! I’m very interested in this role and think it could be a great mutual fit. I wonder if the team would be open to an alternative?
Have the team come up with one or two situations where they’d need me to perform at a high level on a technical task. For example:
* Configure an ETL process that requires multiple joins and incremental updates
* Build a model based on extremely high-dimensional data
* Fit a model based a data set that can’t be fit entirely into memory at one time
For those one or two situations, I will present sanitized production code from my current or past jobs that illustrates how I did those things in an real-stakes environment, and I will happily devote as much time as needed to walk people through that code, explaining both my decisions and what I would do differently going forward. If I can’t find code that matches the tasks, that will still say something about my fit for the position.
I realize this is probably a position you don’t get from candidates too often. I’m happy to explain in more detail and to work out a way forward. I’m committed to getting the team the information they need to make a well-informed decision about my skills. If a technical challenge is a feature of candidate evaluation that you feel you simply can’t negotiate on, then I know from prior experience that it just won’t be a good cultural fit.

That’s my verbose version. Sometimes, depending on the level of rapport I have with the interviewer, I just say something like “look, can I just walk you through some real-world code I’ve written?” No matter how I’ve phrased it, I’ve only had two companies ever accept my offer. I work for one of those companies now.

I know from personal experience that taking this approach with a potential employer, especially in the tech industry, is terrifying. It means passing up a lot of really interesting roles drastically shrinking the pool of prospective employers. I also know that each time I have decided to enter a non-human-centric hiring process in spite of my better judgement, I have regretted it: hours spent on take-home challenges, feelings of imposter syndrome in front of white boards, and frustration when I am finally spit out of the process. It’s not worth the drain on your energy.

I don’t think the hiring climate in our industry will change until candidates refuse to participate in a broken process. That unfairly places responsibility for change on the most vulnerable participants in the process. That may be the brokenest part of the whole mess. But it’s the way it is. If we want a better system, we’ll have to pay for it.

Data science career advice

The data science hiring process is broken

An example of a relatively non-broken process

Technical challenges are too broken to be salvaged

For job seekers, some advice on how to not let the broken process break you

Written by Schaun Wheeler