Cracking the LinkedIn Data Scientist Interview

Published in

DataInterview

13 min readJun 8, 2021

Do you have an upcoming interview for a data scientist role at LinkedIn upcoming? This guide will provide you comprehensive details about the interview process and preparation tips to help you ace LinkedIn’s data scientist interview.

Hi, I’m Dan — a data scientist, previously at PayPal, now at Google. As an interview coach at datainterview.com, I want to help a candidate such as yourself ace data science interviews and land your dream role at a top company. Make sure to check out my prep site: datainterview.com

Before we start, note the following caveats about the guide:

The interview process is based on the senior data scientist interview at LinkedIn. Interviews will vary across different roles and levels.
The interview process is based on the most recent data gathered as of June 2021. The interview process at LinkedIn is always evolving. Therefore, in the future, the interview process could be different from this guide.

About LinkedIn

LinkedIn is a social network with more than 500 million active users and $8 billion in revenue in 2020. The platform allows professionals to connect, exchange ideas and find career opportunities. In addition, the platform provides various B2B and B2C services across talents, recruiting agencies, and companies as seen here:

Role at LinkedIn

Data scientists in LinkedIn leverage data to deliver key business insights and build models that automate decisions. Here’s an example of the role expectation and qualification based on the product data scientist role:

Expectations

Present business insights to stakeholders and guide decisions.
Work with a team of data scientists and cross-functional teams to identify business opportunities, improve product performance and devise a go-to-market strategy.
Analyze large structured and unstructured data and extract key insights or build models to provide value to users.
Design and develop business metrics and dashboards for stakeholders.

Qualifications

Bachelor, Masters, or Ph.D. degree in a quantitative discipline: statistics, computer science, applied mathematics, economics, or equivalent work experience.
Proficiency in SQL and programming languages such as Python and R.
Experience with large-scale datasets.
Data storytelling skills.
Experience in running AB testing.
Experience in statistical modeling.

Stage 1 — Recruiter Screening

The first call with LinkedIn is a non-technical meeting with a recruiter. The call itself is about 20 to 30 minutes, and it is designed for recruiters to screen whether you are a decent fit for the role you applied for.

Before the call

The recruiter sees your job application that includes a resume and an optional cover letter in the applicant tracking system (ATS). Your application is algorithmically ranked based on how well your candidacy matches the role described in the job post.

During the call

During the call, which is about 20 to 30 minutes, the recruiter will format the meeting in the following structure:

Details about the open role and LinkedIn’s mission — The recruiter will describe more details about the role expectation and team.
Candidate background — This is your chance to sell yourself verbally. The recruiter will ask, “tell me about yourself.” You can provide a high-level description of your academic and career backgrounds. Some follow-up questions include: “Why do you want to work for LinkedIn?”
Technical screening with basic SQL questions — As the first line of defense filtering out candidates who lack technical competencies, recruiters will ask basic SQL questions such as: Can you explain the difference between INNER, LEFT, and OUTER JOINS? What’s the difference between UNION and UNION ALL?
Logistics — The recruiter usually asks the following: Where are you located? Are you a U.S. citizen? If not, do you need an employer sponsorship for your visa? What are your availabilities for technical interviews?
Follow-Ups — The recruiter will detail the next steps of the interview process. This is your chance to ask as many questions as you can to map out the technical interviews end-to-end. The more information you have the more you can leverage it to prepare for interviews.

After the call

After the call, the recruiter will follow up with the hiring manager with notes gathered about the candidate’s background, technical screening, logistics, and culture fit. If the recruiter and hiring manager believe that you have potential, then they will advance you to the first technical round.

Preparation Tips!

To demonstrate a really good impression, make sure you prepare the following:

Create a short elevator pitch explaining why you want to work for LinkedIn.
Project a friendly and positive impression during the call.
Brush up on SQL basics.
Prepare questions to ask in advance. For instance, ask questions that will help you gather as much information about the interviews as possible: how many rounds? What is the type of each round? Who is the interviewer? Gathering this information can help you design a prep strategy.

Stage 2 — Technical Phone Screening

The first technical screen is designed to test your technical aptitude in 45 minutes. Should you pass this round, you will proceed to the onsite rounds stage.

The technical screen for the product data science role is usually 45-minutes consisting of two question types covered:

SQL (15–20 minutes) — You will be given access to a non-executable text editor on the Coderpad and asked to solve two to three SQL problems involved in table manipulations. Typically they ask problems that involve table joins, distinct clause, window functions, and where condition.
Product-Sense (15–20 minutes) — You will be given three to five open-ended questions involving metrics, analytics, and AB testing.

For sample questions, check out datainterview.com.

Preparation Tips!

In preparation for the interview, do the following:

Be in the mode of interviewing with as many companies as possible. The more you interview, the less you will feel nervous and perform better over time.
Practice solving SQL problems daily.
To cultivate product-sense requires two steps: (1) become familiar with the LinkedIn core products (2) get practice questions and see how experts would solve them on datainterview.com.

Stage 3 — Onsite Interviews

The onsite round at LinkedIn is perhaps the most rigorous stage of the interview process. This final round consists of five technical and behavioral modules that consume at least four hours! The interview will challenge your data science ability and stamina. But, with an awareness of the interview game and effective preparation, you will increase your chance of success!

The five technical and behavioral modules are the following:

Host Leader (45 Minutes)
Data Manipulation (45 Minutes)
Analytics Case Study (45 Minutes)
Data Storytelling & Strategic Thinking (1 Hour)
Wildcard Module — Statistical Modeling or Applied Statistics (45 Minutes)

Let’s delve into each module in terms of what to expect and how you will be evaluated.

Session #1 — Host Leader

Duration: 45 Minutes | Interviewer: A manager (who could be the hiring manager of the role) | Interview Type: Behavioral

In this interview, the host manager will provide details about LinkedIn’s engineering projects and culture. This is also an opportunity for the interviewer to understand your career objective, technical background, and culture fit.

As a candidate, you should prepare stories about your career background. If you recently graduated, then you can highlight

You can expect a sample of the following questions:

Tell me your career background and objective.
Why do you want to work for LinkedIn?
Tell me a situation when you demonstrated leadership?
How do you convince a stakeholder that your solution works?
Tell me one project that wasn’t going well, but you managed to turn it around.

To respond effectively, use the STAR framework.

Situation — Describe the situation you were in.
Task — Describe the task you had to do.
Action — Describe the action you took.
Results — Describe the outcome of your actions.

Here’s an illustration:

Question: How do you convince a stakeholder that your solution works?

Sample response:

Situation — A stakeholder mentioned that the credit card fraud loss was $3 million per quarter. The manual review team is only able to prevent $1 million.
Task — I proposed that I can build a machine learning model that can reduce the loss. The stakeholder agreed to the proposal.
Action — I convinced the stakeholder that the solution works throughout three major monthly checkpoints: (1) I demonstrated a prototype on a sample dataset that demonstrated a promising result, (2) I provided a UAT solution that the stakeholder can test on a subset of the live data, (3) I pushed the model to production and tracked performance on Q1’20.
Results — Based on the Q1’20 result, the total fraud loss prevented by the manual process was about $1 million. The model I introduced was $1.5 million. This convinced the business stakeholder that the ML solution is more effective in preventing loss. Impressed, she evangelized the adoption of my ML solution in other products and services experiencing frauds.

How will you be evaluated?

Communication — Can you express your thoughts clearly and effectively?
Culture-fit — Are you a person who’s easy to work with, positive, and open-minded to new ideas?
Leadership — Do you possess the leadership skills to lead a project forward?
Conflict Resolution — How do you resolve conflicts with colleagues?

Session #2 — Data Manipulation

Duration: 45 Minutes | Interviewer: A Senior or Staff Data Scientist | Interview Type: Technical

A core skill set of LinkedIn’s data scientists is the ability to manipulate data for statistical analysis and modeling. The purpose of data manipulation is to assess the candidate’s ability to munge data quickly and accurately.

The data manipulation round is essentially an SQL round consisting of two to three questions. In addition to the SQL problems, unlike most other tech companies, LinkedIn will ask you to manipulate data using Python or R.

The interviewer will give you a set of tables and ask you to solve two to three-part questions as seen below:

[SQL] How many users have applied to the same companies they have applied before the past year?
[Coding] Can you provide the solutions to (A) using Python or R?

Note that you will be evaluated based on the following criterion:

Solution accuracy — Do you achieve a solution that outputs the correct result?
Solution efficiency — Is the runtime and space complexities of the algorithm minimized?
Communication — Do you express your thoughts clearly?

Session #3 — Analytics Case Study

Duration: 45 Minutes | Interviewer: A Senior or Staff Data Scientist | Interview Type: Technical

The analytics case study is common across product data scientists’ and analysts’ interviews at FAANG companies. The purpose of the analytics case study is to assess your product and business sense, technical solution, and communication.

Similar to the product case round at Facebook (see ), you can expect three types of questions:

Metrics — How would you measure engagement on LinkedIn’s Groups?
Analysis — The number of applications spiked in October this year, how would you find out the root cause?
AB Testing — An experimentation showed that, at the global level, a new marketing campaign increased the conversion rate of the career product subscription. However, at the region level, variability exists. How would you investigate this? What would be your recommendation to the PM in terms of launch or no launch?

Note that you will be evaluated based on the following criterion:

Product and Business-Sense — Do you demonstrate product knowledge about LinkedIn’s products and services?
Technical Solution — Do you provide a sound statistical approach to the problem?
Communication — Do you express your thoughts clearly?

Session #4 — Data Storytelling & Strategic Thinking

Duration: 1 Hour | Interviewer: A Senior or Staff Data Scientist | Interview Type: Technical

A core function of a data scientist is to extract insights from data and deliver a compelling story to a business stakeholder. The purpose of the data storytelling round is to assess your ability to understand a business problem, analyze a dataset, and provide a presentation in an hour.

The round is conducted in the following manner:

Introduction (5–10 minutes) — Introduce a problem statement and data.
Brainstorming (30 minutes) — Analyze a dataset and construct a presentation on a whiteboard.
Presentation (15 minutes) — Present the results to a target audience.
Q&A (5–10 minutes) — Follow-up questions from the interviewer

How you will be evaluated is based on the following:

Product and Business-Sense — Do you demonstrate product knowledge about LinkedIn’s products and services?
Technical Solution — Do you provide a sound statistical approach to the problem?
Presentation — Do you convey a compelling story to the target audience?

Session #5 — Wildcard Module — Statistics

Duration: 45 Minutes | Interviewer: A Senior or Staff Data Scientist | Interview Type: Technical

In the wildcard module, you can choose either the statistical modeling or applied statistics round.

Option 1 — Statistical Modeling

The statistical modeling round focuses on assessing machine learning theory and application. You will most likely be assessed based on two types of questions — definition-and case-based:

Definition questions:

How does cross-validation work?
How would you split your dataset into train, validation, and test?
What can cause a model to overfit?

Case questions:

How would you create the “People You May Know?”
How would you design a model that recommends job posts similar to the ones a user applied to?

Option 2 — Applied Statistics

The applied statistics round is designed to assess your fundamentals and application of statistics. You should grasp the fundamentals such as the Simpson’s Paradox, CLT, hypothesis testing, type 1&2 error rates, and e.t.c. Similar to the statistical modeling round, you will be asked a combination of definition-and case-based questions as seen below:

Definition questions:

What’s the difference between the type 1 and 2 error rates?
If an alpha of a statistical test increases, what happens to the type 1 error rate?
What’s the difference between the T-test and Z-test?

Case question:

An email campaign was launched in two markets. When the conversion rate is pooled across the markets, the variation showed an improvement over the control. However, when looking at the individual market, the conversion rates flipped. Why?

In either round, you will be assessed based on the following:

Product and Business-Sense — Do you demonstrate product knowledge about LinkedIn’s products and services?
Technical Solution — Do you provide a sound statistical approach to the problem?
Communication — Do you express your thoughts clearly?

Preparation Tips!

To prepare for LinkedIn’s data scientist interview, I recommend the following tips that were helpful for datainterview.com students preparing for the LinkedIn interviews:

1. Cultivate product and business-sense

Become an active user on LinkedIn and deconstruct their products and services. Address the following questions: (1) How does this product work? (2) What is the primary and secondary metrics that measure engagement? (3) What kind of data would be collected behind the feature?

2. Practice data manipulation

Practice writing SQL and coding daily leading up to the interview. Practice typing your data manipulation solution on a non-executable text editor then transfer it onto an executable one to see if you got the solution correct on the first try. Also, practice explaining your solution.

3. Practice case questions

On a blank sheet of paper or a whiteboard, practice providing a solution to a case problem in the following structure: (1) List 3–5 clarifying questions (2) List 1–3 key ideas that help you brainstorm a solution (3) Provide a final recommendation. Do orally explain your solution out loud as this can help cultivate your verbal explanation of your solution.

A Practice Question

Let’s review a sample interview question asked in a LinkedIn interview.

Problem — Product-Sense

How would you measure the percentage of LinkedIn users traveling for work?

Candidate Solution

[Candidate] Before I start, I would like to ask some clarifying questions? Can I presume that we have labels as to whether a user is a business traveler or not?

[Interviewer] No, you don’t. You will need to devise a way to classify the user as a traveler.

[Candidate] Gotcha. I believe that the most obvious technique is to run a survey. A pop-up could emerge on the bottom of the LinkedIn page that asks the user if they have traveled for business reasons for the past twelve months. The yes-rate among those who responded provides an estimation of the % of users who travel for work.

[Interviewer] Great, how would you figure it out without running a survey?

[Candidate] Hmm, I can think of two possible solutions here. If we have historical survey data that addresses this question, then we have labels that can help us build a classification model. Based on user profile information (i.e. role, position, industry) and activities (i.e. advertisement clicks on travel-related posts), we can classify a user as a traveler or not.

For instance, a user who’s in sales is more likely to travel for business than a user who’s a software engineer. So, the classification model will score high for such users.

If we do not have labels, we need to find “proxy” activities indicating that the users travel. For instance, users who click posts that include travel ads (i.e. airline points, travel kits) indicate that the users are travelers. Of course, the interest in traveling expressed via ad clicks is not the same as the act of traveling, but we can mention this as a caveat when we provide the estimation.

[Interviewer] Great, what do you think would be the business use case for the estimation of LinkedIn users who travel?

[Candidate] I can think of two concrete examples. The information could be useful in alluring advertisers in the travel industry to publish ads on LinkedIn. The sales team in LinkedIn’s Ads could leverage this information to convince prospective advertisers. Another useful aspect is that daily this could inform new products and services on LinkedIn. Perhaps there could be a service that can be devised that provides a more seamless experience for business travelers.

Next Step!

Are you ready to advance your career in data science? Check out datainterview.com created by data scientists who work at FAANG. Get access to courses and a Slack group designed to help you crack the data scientist interviews at top companies.

Also, check out the DataInterview YouTube channel for more awesome tips!

https://www.youtube.com/channel/UCQSMCVUX1HgrwxJhO_7VrJQ

Cracking the LinkedIn Data Scientist Interview

About LinkedIn

Role at LinkedIn

Stage 1 — Recruiter Screening

Stage 2 — Technical Phone Screening

Stage 3 — Onsite Interviews

Session #1 — Host Leader

Session #2 — Data Manipulation

Preparation Tips!

A Practice Question

Next Step!

Written by Dan Lee