Hiring a Data Scientist

Published in

Sequoia Capital Publication

10 min readApr 30, 2019

We discussed the building blocks of a data-informed company, how to build world class teams, evolution and characteristics of a data organization, the value of data science and the role of a data scientist. We now discuss hiring. What are the types of data scientists? What skills does have a data scientist have? What should we interview data scientists?

DATA ORGANIZATION

With the increase in value provided by data and the standardization of data, multiple data-related professions have emerged, including data analysts, data engineers, data infrastructure engineers, data architects, and data scientists. The creators, end users, and data products vary across different segments of the stack (see chart below) and to scale a data organization, careful consideration on when one needs to hire each of these functions, what skill sets are required, what ratios of each function are needed, how the different functions would work with each other are all important. In this document, we will focus on the data science track.

Even within the data scientist track, multiple archetypes of data scientists have emerged.

Product generalists who are generic problem solvers working across product issues you may encounter
Early product analyst to determine product market fit for a nascent product
Growth analyst to move a metric
Core marketplace analyst to ensure the healthy liquidity on your platform
Ecosystem analyst to identify competitive threats and strategic opportunities
Machine-learning analyst to ensure healthy operation of the algorithms that power your product

The skills of an early data scientist differs from the product generalist. A one-size fits all approach to hiring for data scientist will not work. One has to be thoughtful of the size and maturity of the data organization, the needs of the product team and the related problems the data scientist will solve. The data organization should evolve with the growth of the product and thus the needs will change over time. For example, hiring data engineers who specialize in petabyte-scale data is probably not valuable at an early stage but might be as a product gets more use.

SKILLS NEEDED

Data scientists are either new graduates (Bachelor, Master, MBA and PhD) or experienced professionals (product, technology and consulting). They have different skills and it is valuable to have a good mix of them to solve a variety of problems. Largely, there are three types of skills that these people provide — scientific rigor, consulting mindset and programming strengths applied to the different dimensions below.

Problem formulation. Data scientists must be able to formulate and structure problems. This generally requires the consulting mindset as well the scientific approach to problem solving.
Technical ability. Programming and scientific skills are both required to extract data.
Analytical ability. Analytical skills are required to extract and manipulate data sets, and to extract value from the data in the form of tables, charts, etc. Consulting mindset and scientific approach to problem solving are necessary to make sense of the data.
Synthesis. Data scientists need to Interpret the results, simplify and synthesize. A consulting mindset is very valuable for simplification and synthesis.
Influence. influencing decisions by storytelling is important for creating impact. Ability to influence using data require a consulting mindset.

Hiring experienced professional versus new graduates depends on the maturity of the organization and the balance the team desires. It is not wise to be top heavy (too many experienced people) or bottom heavy (too many fresh graduates).

HIRING

As you think about hiring for data science, it is important to note that the entire data field is relatively nascent, and it is rare to find people who have all of these skills right from the beginning of their career or their time with you. High-quality data scientists will scale with your organization, picking up the skills they need as the company grows, and becoming either broadly experienced or deeply specialized.

A good hiring process should be oriented towards evaluating the skills needed. For the generalist, the interview loop should have two analytical decomposition cases, one programming, one applied analysis and one scientific and quantitative interviews.

Two Analysis Cases — This is the most important interview and if one fails in this interview, they should not be hired. So, two interviews to test these skills would be valuable.

Problem formulation — Are you able to understand how to solve business questions by formulating the problem?
Communication and Clarity — Are you creative, articulate and clear in your thought process?
Raw analytical ability — Are you able to analyse the problem?
Product mindset — Are you able to provide recommendations for the product?
Product success and health — Are you able to define product success and analyze product health to identify issues?

2. Programming — This is useful to test your coding skills. Rubric here should be — do you spend 80% of time pulling data and 20% on analysis because your coding skills are weak? The bar should be — spend >80% of time on analysis on <20% of acquiring data.

Simple Data acquisition — Are you able to write simple programs to acquire data?
Complex Data acquisition — Are you able to join disparate data sets to acquire complex data sets?

3. Applied Analysis — This is useful to understand whether you can solve a real problem end-to-end. You would need to formulate the problem, acquire and manipulate data and perform synthesis.

Problem formulation — Are you able to understand how to solve business questions by formulating the problem?
Acquisition — Are you able to write simple queries to acquire data?
Manipulation — Are you able to manipulate the data based on the business problem?
Synthesis — Are you able to simply the results and provide clarity?

4. Scientific and Quantitative Ability — This interview is valuable to understand the candidate’s scientific and quantitative ability.

Quantitative — Do they have basic quantitative ability especially with respect to math?
Statistics — Do they have strong skills so they can make sound decisions based on statistics?
Scientific — Do they have the scientific skills to analyze complex data? This does not need to be tested but can be gleaned from one’s resume.

Additionally, some roles may require specific interviews.

Machine Learning — The biggest difference between the generalist and ML interview is that one needs to be much stronger on technical and scientific ability. Additionally, they questions need to be tailored to ML.

Analysis Case — same as the generalist
ML Analysis Case — same as the generalist but the questions should be specific to performing root cause analysis on the output.
Scientific and Quantitative Ability — same as the generalist but they need to much stronger on ML concepts and statistics.
Programming — same as the generalist but they need to clear a much higher bar for programming.

Senior talent — Senior talent need to be able to drive roadmap and strategy using product and demonstrate leadership.

Two Analysis Cases — same as the generalist
Programming — same as the generalist
Applied Analysis — same as the generalist
Leadership and Strategy — Is the candidate able to drive and influence strategy using data?

Marketplace — The biggest difference between the generalist and Marketplace interview is that one needs to be much stronger on understanding economics and big picture thinking. The analysis case interviews should be tweaked with specific marketplace related questions.

Ecosystem — The ecosystem analyst helps drive business and product strategy by analyzing market trends and educating product leaders on their product’s market landscapes. The interviews for this role should consist of

Analysis Case — same as the generalist
Scientific and Quantitative Ability — same as the generalist
Leadership and Strategy — Is the candidate able to drive and influence strategy using data?
Presentation — The candidate should build a business case for a product and present. They should be evaluated for problem formulation, synthesis and influence.

Some additional considerations for hiring:

Building Organization

Centralized versus Decentralized — The question of whether analytics should be centralized or decentralized is always on top of the mind as the team grows. Centralized means that all of the analytics is in one team and decentralized is where teams are distributed. Generally speaking, both have merits and this chiefly depends on size, maturity, potential growth and leadership depth. Ideally, the organization structure should be a combination of the two that maximizes impact and improves culture.
Size — At a very early stage, there are very few people in the company and the question of centralized or decentralized is irrelevant. As the company starts to grow, it may be better to be centralized as people can work on multiple problems across the company, gain insights and share knowledge, and scale analytics. When the analytics team gets to (say) 10 or more, then one should adopt a combined centralized/decentralized model where people are embedded within the product team but are part of one larger analytical organization. As the team gets even larger to (say) 50, it may be better to primarily decentralize to (say) 3 leaders reporting into the business units sharing the same analytics infrastructure . The centralized analytics services would help with hiring, allocating people to teams, career growth, learning and development (training, coaching and mentorship), performance evaluation and building a strong functional identity.
Maturity- While the size of the team would be the most important consideration for centralized versus decentralized organization structure, the maturity of the organization is another very important criteria. If analytics is not having the highest impact it is capable of primarily because of the maturity of the people and the organization, it would be wise to stay centralized for a longer period of time.
Leadership depth — Even with increasing size and maturity, if the team does not have enough seniority in its ranks, it will be very hard to scale. The recommendation would be to continue being centralized until the team has sufficient number of leaders.
Ratio of data scientists — In order to scale an organization, one needs to carefully evaluate the ratios of data scientists to other functions. How many data scientists should one have for every product manager? How many data scientists for every data engineer? What about data infrastructure? One should aim to have the fewest number of data scientists that work on the most leveraged problems. As a result, one should invest in more data engineers and data infrastructure people that can help scale the data organization.
Career growth — Career growth opportunities for a data scientist is very important. The leadership needs to come up with a framework on the career progression of both a manager and an individual contributor. We will provide further guidance in a future post.
Use “Data Scientist” as the job title. Despite the rampant overuse of the title, “Data Scientist” is the de facto title for this (and other) roles, and many strong candidates will not respond to inquiries with other titles.
Hire earlier than you think you should. If your company is in need of a data scientist, you’re already likely to be 6–12 months behind on hiring. Between sourcing talent in a competitive market, interviewing candidates, and ramping up a new employee, it takes many months to go from starting a search to having a productive data scientist.
Leverage data science boot camps for entry-level talent. Numerous agencies and boot camps exist to train exceptionally talented academics as entry-level data scientists. These programs can be an excellent source of raw talent but require mature leadership, mentoring, and a sound team structure to make them productive.

Other Hiring Considerations

Don’t take titles at face value. The title of data scientist is broadly used and frequently misused. A skills-based interview, reference check, and a deep-dive on previous experience are especially critical to assess a candidate’s qualifications.
Quantitative experience transcends domain. A strong data scientist should be able to adapt their skills and talents from one domain to another. Fundamentally, a good data scientist is a truth seeker and problem solver who breaks down complicated issues to first principles. This is a skill that is fairly transferable from other industries and academia.
PhDs are overvalued. Although many data scientists have advanced degrees in quantitative sciences and research experience, this is not a necessary qualification in most cases. A candidate with a problem-solving mindset can develop a strong technical and quantitative toolkit solely from industry experience.
Strategic thinking is undervalued. Many impressive technical minds are not strong data scientists because they are unable to determine the most valuable analyses to perform, instead opting for the most interesting or technically challenging ones. Testing for the ability to connect analytical output to actions and the lack of academic proclivities is extremely important in the interview process.
Don’t require knowledge of specific technologies. This is an emerging field, technologies change massively over relatively short periods. Any A+ hire should be able to translate skills from R to Python to the state-of-the-art technology ten years from now.

MYTHS

Data scientists are data junkies. One of the primary reasons data scientists fail to reach their full potential is that they are treated as a service organization that delivers data to stakeholders. This will definitely limit the influence that data scientists can have. To maximize the value of a data scientists, consider embedding them within product teams, making sure they have a seat at the table when big decisions are being made, and engaging them throughout the entire process of product development.
All data scientists do machine learning. While algorithm developers specialize in machine learning (ML), product analysts are largely problem solvers that may use ML as a tool to discover insights. Moreover, the data scientists that help with designing and shipping experiments have strong statistical backgrounds, but are not necessarily strong in ML.
A data-informed approach is always superior to a data-driven one. The kind of approach needed depends entirely on the type of problem you’re trying to solve. If you want to drive goals, roadmap, and strategy for a product, a data-informed approach is key. But if you want to power production systems, a data-driven approach is needed.
Data scientists need to make product managers happy at the expense of honesty. Data scientists are truth seekers and can best support a product by keeping everyone honest. They need to be empowered to call out problems, even if it means telling a product manager something they don’t necessarily want to hear. A good product manager will appreciate the checks and balances provided by data scientists.

TAKEAWAYS

Scientific rigor, consulting mindset and programming strengths applied problem formulation, technical and analytical ability, synthesis and influence are required for a data science organization to succeed.
Analysis case, applied analysis, programming, quantitative and problem formulation skills need to be assessed during an interview.

This work is a product of Sequoia Capital’s Data Science team. Chandra Narayanan, Hem Wadhar and Ahry Jeon wrote this post. See the full data science series here. Please email data-science@sequoiacap.com with questions, comments and other feedback.