Case Study — staatslabor x Applied

Published in

Finding Needles in Haystacks

6 min readAug 9, 2017

At Applied, we love data so we’re excited to be sharing some results as collected by an Applied beta customer.

staatslabor — a government innovation lab based in Switzerland — used Applied to hire for an Intern role. Two of their founders Alenka Bonnard and Danny Bürkli chose Applied because they wanted to move to a more open and fair hiring process, to ditch outdated, time-consuming methods and embrace their own organisational values of putting a diverse and innovative team to work on social and civil issues.

How staatslabor used Applied

To help them find the best person for the job, staatslabor used 5 work sample questions to replace their initial CV sift.

‘What motivated you to apply for this position?‘

2. ‘What achievement are you most proud of?’

3. ‘Living in Switzerland, what are your expectations of our public administration?’

4. ‘Where do you see being the biggest challenges for the public sector in the next 5 years?’

5. ‘Describe why your skills and experience make you a good fit for this position’

Every candidate had 250 words in which to respond to each question. Those responses were then reviewed by the team.

Of the 5 questions, everybody had their favourite, but which performed best? And how do different people interpret how a candidate responds?

Here we look at two data points

Level or reviewer disagreement
Distribution of scores across reviewer

Typically, the lower the level of disagreement, the more objective the question. By that, we mean that the scores candidates receive depend less on who reviewed them.

Higher levels of reviewer disagreement suggest that the question is more subjective, and different reviewers will have a different interpretation of what good looks like.

Generally, you want to include questions that have some, and mixed levels of reviewer disagreement.

Distribution of scores by reviewers at aggregate is interesting too. We use this data point because hiring teams should know whether a question is particularly hard, or easy. If the distribution of high scores (4 or 5 stars) is too great, the likelihood is that all candidates are doing well at this question, and it might not be helping you to separate out the top performers.

We found that on average, reviewers scored skills and experience higher than the others, but there was also a bit more diversity in scores there, relative to the other questions. It seems reviewers were least impressed with candidates’ responses to what they were most proud of!

Of course, the distribution of scores will be different between reviewers. Here we see which reviewers were ‘more’ or ‘less’ generous.

Overall we saw that the two reviewers broadly scored on a bell curve, and both were hard to impress: just 5% of the time did a reviewer give 5 stars! As we can see in the distributions below, Danny was slightly tougher than Alenka — his average score was 2.6, whereas Alenka’s was 3.1.

We found that about a third of the time Alenka and Danny came to the same conclusion about a response. But that means that just over 60% of the time they disagreed! In fact, about 15% of the time they disagreed by either 2 or 3 stars (out of 5). Even for the 1 in 2 times that the reviewers disagreed by 1, over several questions these slight differences can add up to mean the difference between being called through to the next round or not. That’s why Applied averages all the scores across reviewers to mitigate some of these idiosyncrasies.

The big question is, which question helped us find our best candidate and did any questions post any threat to diversity?

First up, which questions ended up being better predictors for overall performance? We found that the most highly correlated question was Q3 — where candidates were asked to talk about their expectations as a citizen of Switzerland. Followed by their motivation and the more work-sample based question about future challenges for the public service. Interestingly, the two more personal questions (what are you most proud of, and skills and experience) were least correlated with their final score

When we look at it from a diversity perspective, it looks like overall, female candidates scored higher than their male counterparts, and to a lesser extent, applicants from more educated households did better. Overall there weren’t significant differences.

Ok, the data looks good, but how did it feel to actually use the system. Here are Alenka’s (the recruiter) and Lena’s (the applicant) experiences:

Alenka:

“Not being a professional in the field of HR, l find classical recruiting practices particularly fastidious and outdated: when I receive 50 applications — all more or less copied from models on recruitment sites — or work certificates dating 10 years back, I learn practically nothing about the person who — perhaps — is interested in the described job. Everyone is losing precious time and might lose even more so during the interviews.

Applied really changed everything. It’s impossible to review it in detail here, but from now on I really don’t see myself using any other recruitment tool.

First of all, the recruiter who, like me, shows no great interest for letters of motivation, designs his own questionnaire, targeting the required skills (writing, motivation, knowledge, opinions, etc.). Then, once all the applications have been collected, they are assessed by the team. It is noteworthy that at this stage, Applied does not yet show us the candidates’ profile (age, gender, origin…), nor the attached CV. Once the answers to the questions have been assessed and the scores are visible, we have access to the CVs and profiles. I personally chose to play along and invited candidates for an interview based only on their score, without having seen their CV beforehand. I did have a look at the other CVs, some of which were very good in my opinion. But given the answers to the questions asked, both the candidate and I would have lost time if I had invited them for a discussion.

I interviewed a series of candidates with very interesting and diverse life histories. Of course, informing capable and motivated candidates that they were not chosen for the job is never easy, and in that respect, even Applied cannot help. Lena, our present intern, had one of the two best scores, and I really feel that she fits in very well with staatslabor”.

Lena:

When I first encountered Applied, I hesitated. It asked for a different kind of effort to what I was used to when applying for a new job in Switzerland. But it surprised me and turned out to be an enriching experience. When I applied for the open position at staatslabor I was actually glad they couldn’t see my age and my gender. This is because I don’t have the typical profile for an internship, being older than the typical applicant, with an atypical personal career choice.

In my case I had to answer five questions. When answering those questions I had to think about the position in depth and really explain why I wanted to work for the company. I got a better understanding of what my future responsibilities included, as well as an impression of what culture I could expect from my new employer. Applying for all jobs in this way would avoid submitting unnecessary applications, as I really had to put a different kind of effort into my application. Because staatslabor worked with this unbiased tool it added something special to the reward of being hired. Not only did I get a position I really wanted, I also felt that I was sincerely chosen for my abilities and skills.

If you’d like to try Applied yourself, sign up today www.beapplied.com

Case Study — staatslabor x Applied

How staatslabor used Applied

Here we look at two data points

The big question is, which question helped us find our best candidate and did any questions post any threat to diversity?

Written by Theo Fellgett