Why all job interviews should include doing *real* work
The most important part of my company, PolicyStat, is our people. The processes, the tools, the software, the legal entity… all of that pales in importance compared to our team. Our team is what makes our customers successful, and making our customers successful is why we exist as an organization. This means that hiring/recruiting is the single most important thing we do.
Traditional Hiring is Broken
Belief in the primacy of team quality is fairly uncontroversial. You’ll find few executives who openly disagree. Despite this lip service, I think the actions of many (most?) organizations in our industry reveal a failure to invest in their hiring process. The standard interview process consists of a series of unstructured conversations that do a poor job of predicting which candidate will best improve the team on which they will work. These unstructured conversations do little more than allow us to confirm our biases and hire people that remind us of ourselves.
Oh, they went to Big State School and played intramural ultimate frisbee, too!? I’m awesome, and all my ultimate friends are awesome, so they’re probably awesome at [complex, uncorrelated-with-ultimate-frisbee job skill]. — Our brain’s system 1, totally blowing it
Despite the existence of the internet, superior job boards, machine learning, and a world-wide network of talent, most hiring is still done via referrals. My theory is that referrals dominate because hiring managers tacitly realize that their interviewing process is not a great predictor of actual success.
There are many problems with relying mostly on referrals + unstructured interviews for hiring, all of which are well-described in How to Make Tech Interviews a Little Less Awful:
- Referrals exacerbate diversity problems, as we all tend to mostly know only people like us. Remember, three quarters of whites don’t have any non-white friends.
- Unstructured interviews unduly reward confidence, even when over-confidence can be toxic for your organization. In the absence of data, our system 1 engages, triggering heuristics that favor incompetent, over-confident men.
- Unstructured interviews don’t predict success. When studied, they’re shown to be essentially random predictors of success. Books like Thinking, Fast and Slow have taught us that we often use heuristics without being aware of it. When a question is hard, we tend to use a heuristic that substitutes a related but easier question. Predicting a candidate’s future job success is hard, so we mostly substitute the easier question “does this person remind me of myself.”
The Fix: Work Sample Tests + Structured Interviewing
Rather than appealing to the extensive research showing the superiority of work samples and structured interviews (done by Google and others), or to PolicyStat’s experience using this technique across 300+ candidates and 6 unique positions, let’s start by appealing to common sense. Pretend we’re hiring a concert violinist. Which interview method do you guess is better at predicting violinist success?:
- You ask your network to refer you to violinists. Then, you have a conversation with each candidate about their experience with a violin. To overachieve, you also ask them how many piano tuners live in Seattle, because you’re pretty sure Google asks that (they don’t).
- You use the breadth of the internet to find candidates. Then, you have each candidate play the same piece from behind a curtain (so you’re blinded to their appearance) to be judged by violin experts on the quality of their play.
Not many folks find option 1 compelling, yet that’s exactly what we’re choosing with the standard interview process. This really isn’t rocket surgery.
When candidates interview at PolicyStat, we give them a specific project that mirrors the actual work as closely as possible. We then score that work sample (aka “job simulation”, aka in-basket test) with an objective rubric we defined ahead of time. We use the result of that rubric as the heaviest single component of our final decision.
Work Samples: Not just for engineers
There’s an increasing amount of buzz from the tech community about this approach to interviewing, from companies like Spreedly, Matasano, to AirBnB’s data science team. I’m proud of my tech peers, but the advantages of this approach aren’t limited to engineering/programming positions. We’ve used work samples + structured interviews with great results for these titles:
- UX Design Lead- Take a problem statement, an existing UI, and user story. Create a design sketch to propose UX improvements.
- Data Conversion Specialist (a specialized data entry role)- Follow a set of instructions to actually perform the data entry.
- Growth Hacker- Given a company scenario, build a landing page and ad campaign.
- VP of Demand Generation- Given background information and a specific company scenario, design a growth experiment, and implement as much as possible.
- Customer Support- Use our support site to prioritize and respond to these 10 support tickets.
- Implementation Consultant- Prioritize and answer these 10 customer emails.
- Software engineer- Submit a GitHub pull request that adds a specific feature and fixes a bug for a specific Django-based project (the technology we use).
Creating a work sample
The general pattern is to figure out the most frequent, most important things someone will do in that role and then to simulate them. There are various tactics for doing a good job of that (and I’m working on a post with our learning), but let’s walk through a quick example. Lots of knowledge work involves answering email and there’s an easy pattern for creating email-answering work samples:
- Get folks who do that job in a room for an hour and have each of them pull a few challenging emails they’ve answered.
- Write up a bit of background for each email so that the candidate has enough information to answer the question.
- Have other folks in your organization take the work sample, and use that information to iterate. Pay close attention to the scope/timing, to instructions, and to providing sufficient background information for someone outside your org/department. If your work sample presumes knowledge of the proper structure of your internal TPS Reports, it has room for improvement.
What Makes a Good Work Sample Test
Creating a good work sample test is not easy, but is an investment that pays with each additional interview. Your work sample can be iterated on to make it reliably better at predicting candidate success. Here are the core tenants we’ve landed on for the creation of a predictive work sample:
1. The task is important and representative
The work sample should be as close as possible to an important task (or set of tasks) that the candidate will perform while actually doing the job. In academic circles, this property is sometimes referred to as criterion-related validity and content validity. A work sample to file an expense report is probably not a good idea because it’s not an important task.
2. The context of the task approximates the real-world
The closer your work sample matches the context of the actual work, the better it will predict success in the role. For example, if the job is software engineering, asking a candidate to write code on a whiteboard is worse than asking them to write it in a browser, which is worse than letting them use their laptop with their preferred text editor and environment.
3. You uniformly enforce a time limit
Consistently enforcing a time constraint is the best way to respect the candidate’s time. Constraining time mitigates against providing an advantage to candidates that have more flexibility in their schedule or fewer other demands.
4. Scoring is done via a rubric, with a series of yes/no questions
Evaluating a complex work is a hard problem, which means our brains substitute easier questions and introduce bias. A biased evaluation of actual work is better than a biased evaluation of a candidate’s resume and ability to mirror you, but we can do better. When in doubt, err towards more granular scoring and validate by having scorers use the rubric independently. Everywhere that two scorers differ, that’s your list for rubric improvement. Our 4-hour engineering work sample has ~90 individual items on the rubric.
5. The candidate has an opportunity to prep (no Pop Quizzes)
Because this work sample is timed, candidates should get a prep guide in advance of their scheduled work sample with a list of resources they can use to prepare. It should include tutorials, blog posts, sample projects, and whatever else they would have googled and found if given an unlimited amount of time to learn and complete the project. You want to hire folks who can learn. A work sample that mostly tests how recently someone has used a specific learnable tool/technique is going to cause a lot of preventable false negatives.
6. The candidate receives detailed feedback afterwards
Providing detailed, meaningful feedback is the least we can do after candidates spend 4+ hours interviewing with us. The more detailed, the better the candidate experience.
OK: “Show more empathy”
Better: “You lost points on support tickets 3, 5, and 7 because we’re looking for a non-corporate expression of personal empathy like ‘I’m sorry this bug has wasted so much of your time’”
Bonus Points: This feedback turns the work sample calibration into a useful training exercise for your current team.
Work Sample Limitations and Tradeoffs
Work samples are not a panacea.
- They require up-front work. It’s much easier to just walk into an interview or phone screen and start talking. For roles you’ll hire repeatedly, the up-front work is amortized over more hires, but it can be a challenge to work sample for a one-off role.
- If you don’t understand the role enough to define or judge specific work, then you can’t create a work sample. You were always going to struggle to screen for a role without an understanding of the work, but a default towards work sample creation makes this painfully obvious. You should lean on people in your network to learn more.
- Work samples aren’t great at testing for things like “is this person an asshole”. I prefer a topgrading-style structured interview, for that purpose.
- Work samples test skill, but not will. Success in a role requires both the ability and willingness/motivation to achieve. The structured interview is where you test the other half of the “skill/will bullseye”. Tip: During your phone screen, before you bias the candidate, ask them about their career goals and then actively listen. If their goals don’t line up with your goals for the role, then don’t waste their time.
- Some candidates won’t take a work sample. Whether through pride, or simply being too busy to take a 4-hour work sample, some folks will drop out of your funnel. Initially, I was very concerned about this, but have found it to have only a very small effect (~5% of candidates).
Always Be Iterating
Every tiny piece of your hiring process should be subject to many small improvements, from the work sample, to the prep guide, to the job post, to your email communication. You’re not going to eliminate all context bias or rubric bias on the first crack. You’re not going to strike the perfect tone in your email communication. You’re probably going to fall prey to the curse of knowledge and forget to provide important information. That’s OK, because those things were true about your unstructured process, too. The difference is that improvements made to your structured, work sample-based process are durable.
My general rule is that if I need to answer a candidate’s one-off question, then that’s a “bug” in my process. I find a place in the prep guide, job post, or opening explainer email to insert that information. The next candidate then gets a slightly better experience.
If anyone else has experience using work samples for their interviews, I’d love to talk shop. Please get in touch.
Edit: Work Sample Build vs Buy
2 years after publishing the article, I founded a startup called Woven that uses work simulations for developer hiring. If you’d like the magic of work simulations for hiring, but aren’t excited about your engineers spending hours creating and evaluating them, come say hi!