The concept you have to master if you apply for a Data Scientist job

Or you should verify if you’re recruiting for one

Piotr Gabrys
LogicAI
2 min readAug 28, 2018

--

Recently we have been conducting technical interviews for a Junior Data Scientist opening at LogicAI. It’s not an easy task to check candidates Data Science skills in less than half an hour. After a few meetings, we came up with a question that allowed us to evaluate every applicant.

The puzzle is simple:

“How would you assess the quality of your model?”.

It may seem obvious for an experienced Data Scientist. Our candidates struggled with it though. We asked it in a context of a multiclass classification problem.

How would you answer this question? We wanted to hear something like this: Firstly, define the scoring metric. Secondly, divide your data into training and testing subsets. Then search for model’s hyperparameters with stratified cross-validation. Finally, display the quality metric defined in the first step based on the testing subset. That’s it!

What’s the problem then? We thought about it and came to a conclusion: the validation requires a deep understanding of data modeling. You have to understand what the bias-variance trade-off is. How to fight overfitting? What is an information leak in the context of hyperparameter optimization? Which metric to use to assess the quality of your model?

To evaluate candidates’ technical skills, we focused on the model validation part. It worked great! We could quickly identify the applicant’s strengths and weaknesses. With proper assessment, we were able to undertake the best possible decisions and hire great people.

What does it mean for you? If you’re an aspiring Data Scientist make sure that you profoundly understand model validation. Not only will it help you get your dream job, but also become a better Data Scientist. If you’re recruiting, you could check how asking this question will help you during technical interviews. In LogicAI we’re still hiring entry level Data Science positions, and I’m sure our candidates will hear the question:

“How do you know your model is good?”

If you want to learn more about model validation, I highly recommend studying this blog series and taking part in Kaggle competitions. The latter quickly verify your model validation and offers resources to improve it.

What do you think about our method? Let me know in the comment section or clap if you liked it!

--

--

Piotr Gabrys
LogicAI

Data Scientist, traveler, books lover, husband and parent. Tries to understand the world but constantly fails.