Exceptional Resources for Data Science Interview Preparation. Part 1: Live Coding

Artem Ryblov
12 min readFeb 22, 2024

--

Hi! My name is Artem. I work as a Data Scientist at MegaFon (a platform for secure data monetization, OneFactor). We build credit scoring, lead generation and anti-fraud models using telecom data, and we also do geoanalytics.

In January 2022, I created a repository to store links to resources on various topics: interview preparation, computer science, data science, git, mathematics and many others. Over all this time, the repository has grown to such a size (both in terms of the number of resources and the set of topics) that I came to the conclusion that this information needs to be structured somehow.

As a result, I decided to move the block with resources for preparing for interviews into a separate repository, and in order to make it clear how to use these resources, it was decided to write a series of articles, the first of which you are reading right now.

A typical interview process through the eyes of DALL-E 3

An interview for a Data Scientist position consists of the following sections (the order may vary):

In this article, we will understand what a live coding interview is and how to prepare for it.

This blog-post will primarily be useful to Data Scientists and ML engineers, while some sections, for example, Algorithms and Data Structures, will be suitable for all IT specialists who will have to go through the live coding section.

Remarks

Most of the resources in this article are free, but there are a few paid ones. I recommend buying them only if you clearly understand that you do not want or cannot spend your personal time searching for information on your own.

Table of contents

  1. Preparing for an Algorithmic Interview
  2. Resources
    - Algorithms and Data Structures
    - Programming in Python
    - Solving a Practical Data Science Problem
    - Hybrid
  3. Learning How to Learn
  4. Let’s sum it up
  5. What’s next?

Preparing for an Algorithmic Interview

As part of the live coding section, your knowledge of the following topics can be tested:

  • Algorithms and Data Structures
  • Programming
  • SQL
  • Solutions to Practical Data Science Problems

The algorithmic interview is one of the most common formats of the live coding stage for Big Tech companies, so in this section we will dwell on it in more detail. You can read about other formats in the Resources section.

Introduction

During one algorithmic section, you will be expected to solve 1–2 algorithmic problems (examples of such problems can be found on LeetCode) in 60–90 minutes. Some companies may have several such sections.

If up to this point you have never gone through this section or solved algorithmic problems, then I recommend starting with the book Grokking Algorithms. An illustrated guide for programmers and other curious people by Aditya Y. Bhargava. This book is a very simple and friendly introduction to algorithms. I wouldn’t rely on it to prepare for an interview, but it’s fine as an introduction to the topic. While reading this book, I noticed many errors in it, so it’s worth checking your intuition on the author’s website in the Errata section.

Preparation Plan

Once you understand what algorithms are and how such sections are conducted, you can begin preparing. And here, it seems to me, there are three main approaches:

  • Fundamental
    Taking courses and reading books on algorithms and data structures.
  • Practical
    Solving problems on various platforms: LeetCode, Codewars, HackerRank, etc.
  • Hybrid
    Learning theory and instantly putting it into practice.

I prefer the latter — that’s what we’ll focus on.

Below, we will consider three action plans (roadmaps) for studying algorithms (preparing for an algorithmic interview).

NeetCode’s Preparation Plan

Let’s start with the following ⭐ roadmap:

NeetCode’s roadmap

In the image above, I have highlighted the blocks and prioritized them — this is the order in which the author recommends getting acquainted with these topics.

When we click on one of the topics in the roadmap, we get a set of topics to study (in this case, these are Dynamic Arrays, Hash Usage, Hash Implementation and Prefix Sums), and below we see problems for practice with division into levels, a video explanation, and a solution in Python. At the same time, if we click on the task, we will be transferred to the LeetCode website.

Contents of the Arrays & Hashing block

Training plan from Vladimir Balun

A few months ago, the YouTube algorithm suggested a video [rus] from which I learned about another roadmap for preparation and collected all the necessary information here:

  • To-Do List: Converted the Miro board from the video into a convenient list with hyperlinks.
  • Course program: I have attached the author’s course program, which we will use for preparation.

Training plan from EDA Academy

Another roadmap for learning algorithms is from EDAcademy [rus].

What to do with all this?

I recommend making a file in which you can track your progress and save the necessary information.

I came up with the following structure:

  • The entire file is divided into levels (similar to the blocks from the preparation plan from NeetCode)
  • There are topics within the level
  • Each topic is divided into two sections
    - Theory
    We save the theory so that we can repeat it if necessary.
    - Practice
    We save links to problems and solutions. For each task, we can briefly describe the approach.

As an example, I am attaching a template with the first two blocks, which can be duplicated and used.

Preparation Resources

In this section, I will share resources for preparing for different types of live coding interviews.

Algorithms and data structures (eh, classics)

In the previous section, approaches to the study of algorithms and data structures were analyzed, three preparation plans were considered, and a template for creating your own roadmap was proposed.

The only thing missing is the resources to fill out a personalized roadmap, which we will consider later in this section.

Online Platforms

Books

Preparation for an interview:

Learning algorithms:

  • Introduction to Algorithms by Thomas H. Cormen
    A classic work on algorithms.
  • Algorithms by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani
  • Competitive Programmer’s Handbook by Antti Laaksonen
    This handbook is for people who are strongly proficient with most Leetcode algorithms.
  • Competitive Programming by Steven Halim
    For the most experienced algorithm enthusiasts, this book will cover every niche data structure and algorithm that could possibly be asked in any coding interview. This level of preparation is not generally needed for FAANG type companies, but can show up if you’re considering hedge fund type companies.
  • Others:
    - Algorithms and Data Structures by Niklaus Wirth
    - Analysis of Algorithms by Jeffrey McConnell
    - Algorithms in C++ by Sedgwick Robert/ Algorithms in Java by Sedgwick Robert, Wayne Kevin
    - Data Structures and Algorithms by Alfred Aho, John Hopcroft, Jeffrey Ullman

Courses

  • Introduction To Algorithms by MIT
    This is an introductory course covering elementary data structures (dynamic arrays, heaps, balanced binary search trees, and hash tables) and algorithmic approaches to solving classical problems (sorting, graph searching, and dynamic programming). Introduction to mathematical modeling of computational problems, as well as common algorithms, algorithmic paradigms, and data structures used to solve these problems. Emphasizes the relationship between algorithms and programming, and introduces basic performance measures and analysis techniques for these problems.
  • Algorithms + Data Structures from CS50’s Introduction to Computer Science
  • Algorithms and Data Structures for Beginners (paid) + Advanced Algorithms (paid)
    Courses from the creator of the first preparation plan (NeetCode). He explains algorithms in simple language and reinforces with practice.

Other

Tips:

Data Structures:

Repositories:

  • Leetcode company-wise questions
    A repository containing a list of questions (by company) is available on the premium version of Leetcode.
  • The Algorithms
    Repositories for studying data structures and algorithms and their implementation in any programming language.

Other:

  • Algorithmic concepts
    A cheat sheet with theory on algorithms and data structures.
  • Algorithmic Thinking
    The articles on this site discuss different approaches to studying algorithms. All of them are illustrated using problems from LeetCode, and not only the solutions are given, but also an explanation of WHY the solution works and HOW you can understand it too.

Programming in Python (where would we be without it)

For a Data Scientist, Python (sometimes R) is the working tool with which data analysis is carried out, so it is important to understand it well in order to write “clean” and optimal code.

To test this knowledge, you may be asked basic theoretical questions during the interview.

For example:

  • What data types are there in Python?
  • What data structures are there in Python?
  • Differences between data types and data structures
  • Asymptotics of basic operations in Python

It is also often suggested to estimate what will be output after executing a code cell/function.

For example:

# What will the code output?

D = {}

A = D
B = D.copy()

B[ 'b' ] = 2
A[ 'b' ] = 3

print(D, B, A)

When answering this question, the interviewee should demonstrate their knowledge of how to copy objects in Python and the differences between them (shallow vs. deep copy), as well as mutable vs. immutable data types in Python.

To prepare for this section, I recommend that you read the resources below.

Clean Code

Theory

Questions

Other

SQL

Proficiency in SQL (Pandas/PySpark) is also often tested during live coding interviews because it is a necessary skill for a data scientist.

As part of this section of the interview, you may be asked theoretical questions, for example:

  • What does Union do?
  • What is the difference between INNER JOIN and LEFT/RIGHT JOIN
  • What is a NoSQL database? What is their fundamental difference from SQL?
  • What is a sub-query, and what are they for?

They may also offer to solve 1 or 2 tasks of writing a SQL query (or performing similar transformations in Pandas/PySpark/name_your_framework), for example:

Given the table transactions:

| id | date       | income |
|----|------------|--------|
| 1 | 2021-04-01 | 22000 |
| 2 | 2021-04-02 | 11100 |
| 3 | 2021-04-11 | 64000 |
| 4 | 2021-05-04 | 23000 |
| 5 | 2021-06-17 | 20000 |
| 6 | 2021-06-18 | 7900 |
| 7 | 2021-06-19 | 32000 |
| 8 | 2021-07-12 | 17000 |
| 9 | 2021-07-23 | 14600 |
| 10 | 2021-01-12 | 26300 |
| 11 | 2021-08-11 | 10000 |

For the current month, calculate the moving average of income for the previous 3 months.

When answering this question, the interviewee should demonstrate knowledge of grouping data using the GROUP BY operator, the MONTH/SUM/AVG functions, window functions and ROWS BETWEEN frame limiting options.

To prepare for this type of interview, I recommend reading the resources below.

Courses

Practice

Solving a practical Data Science problem

Sometimes, within this section, instead of solving an algorithmic problem or writing a SQL query, a more practical task is proposed, namely solving an applied data analysis problem:

  • End-to-end:
    Starting from EDA (Exploratory Data Analysis) to model building.
    Usually, such a problem is given as a home assignment, because it requires from several hours to several days to solve.
  • Shortened format:
    This format does not take much time, so it is used during an interview.
    - A code is proposed (let’s say a code for building a model), which needs to be analyzed, errors found and improvements suggested.
    - It is proposed to solve one part of a complete data analysis problem, for example, to carry out EDA or feature engineering.

The only way to prepare for this type of live coding interview is to practice solving Data Science problems at work and/or on your own.

The resources I have listed below will help you with this.

Code Analysis

Practice

For practice, I recommend doing pet projects (you can start with this article — Data Science Pet Projects. FAQ.

Hybrid

During an interview, for example, you may be asked to solve:

  • Several algorithmic problems
  • One algorithmic problem and one SQL problem
  • Take-home assignment related to ML and SQL
  • etc.

Therefore, I recommend that you take into consideration all the points mentioned above.

Learning How to Learn

To remember all this information for a long time and not go crazy, you can use the spaced repetition technique, and to implement this technique, use the program ⭐ Anki.

You create a deck, add the information you want to remember, for example, algorithmic problems, and repeat them when the program prompts.

I also recommend books:

These books will be useful to anyone who wants to effectively absorb new knowledge and use it in practice.

Let’s sum it up

The main idea of this article is to use available resources on the Internet (plans and programs, courses, videos, articles, blog posts, etc.) in order to:

  1. Prepare your own plan for studying the topic
    I’ve highlighted my favorite resources ⭐
  2. Fill this plan with theory and practice
  3. Achieve your goal (studying algorithms / job search), while saving all resources for the future as a reference (you will still have to repeat it when applying for the next job)

What’s next?

In the next article, we will analyze resources for preparing for the section on classical machine learning.

You can find current resources for this series of articles in the Interview preparation repository, which will be maintained and updated. You can also subscribe to my telegram channel, Data Science Weekly, in which I share interesting and useful resources every week.

If you know of any cool resources that I didn’t include in this list, please write about them in the comments.

--

--