How we hire Data Scientists — a 5 step process

7 min readOct 26, 2018

Most companies struggle to hire good data scientists. At James we have developed a 5 step process for hiring, which has served us pretty well over the years, and is responsible for bringing together a bunch of amazing human beings.

The process itself is mostly common sense, and broken into 5 simple steps.

Step 1 - Getting applications

The #1 mistake companies make when hiring is to only start working on it when they need someone urgently. Having a healthy pool of strong applicants requires constant “low-intensity” work from everyone in the company, employees and managers alike.

Firstly, there is presence. Employees must be encouraged to attend meetups, industry events, and social events, and to talk openly about their work. More often than not we find that someone applied because they met an existing employee at an event, and decided that these are the kind of people with whom they want to share their professional careers. Being open about our work, and especially our challenges (“we are completely stuck trying to solve problem X” == geek candy!) makes the community at large feel engaged with the company, and therefore want to join as opportunities arise.

Another easy win is having a newsletter to which potential applicants can subscribe when there are no open job offers available. We often find that our website gets visited when we are not hiring, and we must ensure potential candidates find some way to interact with us.

The gold standard is giving back to the community, by either speaking at public events, or teaching others. All data scientists at James have at some point either spoken at industry events, taught at the non-profit Lisbon Data Science Academy or at other schools in their free time. Time is made in working hours for data scientists to prepare presentations and talks, to ensure that James is constantly and well represented in the community, not just by managers but by technical staff.

Once you have a job offer, it must be crystal clear as to the requirements, expectations, and timelines. Data scientists (and technical workers in general) have a lot of optionality, so an ambiguous job description will not get a second chance, and no one is going to send companies an email to ask what they meant.

2. Pre-screening

Once we decide that we are actually going to hire someone and know exactly the requirements, we set up the job listing on the website, and add a public contact for potential applicants to ask anything about the company and position. The job application can never be done via email, but rather via a short form. This form contains two sections:

a) Questions about motivation, and qualification.

b) checklists of requirements

These requirements are also listed in the job application

Surprisingly, these two simple self-assessment tools generally reduce the application pool by about 30%, by filtering candidates who ignored the requirements listed on the job offer. It also allows us to understand whether there is a sub-group of candidates that clearly stand out, to avoid sending highly demanding technical challenges to a large number of people.

CVs, while potentially relevant for more senior positions, are for the most part considered secondary, as the technical skills will be examined by us, both at interview and challenge level. We are absolutely fine with self-taught applicants, and have found that passionate enthusiasts often out-perform stronger academic records.

3. Technical test

The candidates that survive the self-assessment are invited to perform a small challenge, which contains some of the characteristics of the work they will be expected to perform. This takes the form of a mini “kaggle” challenge, performed by the candidates remotely. This is an example of a challenge for an entry-level data science position in our Financial Services team. The candidate will have to defend their challenge if they pass to the second stage, rendering cheating pointless.

Throughout the process, transparency is key. Applicants who did not make it through the self selection are informed immediately. Applicants who pass to the next stage are told ahead of time how many candidates are still left, and how many will get through.

Transparency helps candidates manage expectations correctly

When examining the results we generally prefer to do it blind to the applicant’s identity, to avoid any unconscious bias (“oh, this is from the candidate from <include University name>, it must be good!”). It is also essential to define the grading criteria in advance, to avoid forgetting one dimension and over-examining others.

Crucially, we do not score data scientists only on their code, but also on their organization, and in particular ability to communicate data. Candidates which are strong on only one aspect but weak on others are generally not selected.

Modeling (generally where most traditional interview processes spend their time) gets only about 20% of the attention, because realistically that is the percentage of time data scientists spend choosing and optimizing models in the real world.

4. In person interview

Note: our in-person interview process was heavily inspired by Feedzai’s Miguel Almeida, who has by now interviewed more data scientists than is medically advisable, and generously did an excellent talk on the subject.

Our in-person interviews last between 1 and 2 hours, and search for a number of different skills:

Knowledge — does the candidate know the stuff that’s required?
Reasoning — can the candidate figure stuff (s)he doesn’t know in advance?
Passion — is the candidate passionate about the field (s)he’s applying for?
Communication — can the candidate communicate complex ideas clearly?

The interview is composed of a conversation with a sequence of increasingly hard questions, designed to cover all of these 4 aspects. There is no big problem in not being able to answer all (it is, in fact, borderline impossible to answer everything), and the experience gives us a decent idea of the candidate’s abilities in multiple dimensions.

As an interviewer, it is important to avoid falling for a couple of traps. One of them is asking only questions that a “trained monkey” would be able to answer. These are generally the more predictable questions that will show up in any “prepare for the interview” guide. Asking the candidate to explain an answer in more depth, or to go through the thought process with a whiteboard will generally solve this part quite well.

The other side is asking too many questions about one particular area of knowledge (i.e. modeling), which might be an unlucky candidate’s only area of weakness.

If a candidate does not know the answer to one question, tell them it’s ok to not know everything, and switch contexts. Avoid making the candidate feel that the interview is going poorly, as the emotional state may induce bias in how much knowledge they can display.

5. Selection

By this point there should be a clear pool of up to 3 candidates which are separate from all the rest.

Follow up first with your top candidates, and make sure that they are available to join if selected. You should assume all candidates are interviewing at multiple companies simultaneously, rather than believing that you are “picking” a candidate to join. This is a two way process.

Invite the top 3 candidates to meet the team, and collect feedback. Go even deeper into questions of cultural fit, personal ambition, and of career paths.

Finally (and somewhat counter-intuitively) we attempt to scare away the candidates which interest us the most, by telling them how hard startup life is, how they might make more money working for a hedge fund, how unstructured their life will be compared with a large corporate. If they survive this “scaring away” period, they will certainly be no unwelcome surprises for them once they join. At this point, we generally be confident of whom should be our first option, and make an offer.

Crucially, no candidate can ever, ever, ever go without a final answer. Candidates all get individual, personalized answers as to why they were not selected, and an honest assessment as to whether it is worth them applying again, or whether it is simply not a match. If you treat candidates as discardable resources, you do not deserve to hire top talent.

As mentioned at the start, hiring data scientists is deceptively simple. If you treat your candidates the way you would like to be treated, you will achieve the holy grail of hiring: getting the talent your company needs, while simultaneously being recommended by people you (professionally) turned down.

Applying to a company is a hard, potentially life-changing process, and should be treated as such by leaders making the critical decision.