Building A Kickass Data Sciences Team — For Startups

Simpl
Simpl - Under The Hood
6 min readJun 2, 2016

If you’re starting up on your own with a tech or tech-based product, chances are that you will be dealing with copious amounts of data. If you’re smart enough, you would have already hired a data engineer to make sure the infrastructural needs of capturing and storing the data are met. However, the real challenge arises when you decide to make sense of this chaotic and ever-evolving ecosystem of data. To do this, you need to have a dedicated data science team in place. Although the roles of the individuals in this team change as the business evolves, make sure the data scientists you hire meet your current requirements and criteria. We categorise the data capabilities required in a startup in 3 broad phases; the Muddy phase, the Silver Lining phase and the All Is Well phase.

The Muddy Phase

This is during the initial days of the startup, usually 6–8 months before the launch. Although the need of a Business Analyst could be debated at this stage, we strongly recommend having one. The requirements at this stage are unclear, ambiguous and muddy. At this stage the business doesn’t even have any data to work with, nor do they know what data they want to collect. A business analyst at this stage would help you figure out a roadmap of projects and would also interact with your tech team ensuring that the right data points are being captured from the very start. If hiring for data sciences role during this stage, one should look out for following candidates.

  1. A data engineer — Although not entirely part of data sciences, the person in this role is going to be the backbone of the data sciences team. Don’t depend on the tech team to do the heavy-lifting for the data pipeline, since the requirement for a data pipeline might be very different for the tech stack that you have and should not be tried to merge into one unless absolutely sure. Do-not go for an “expert” at this stage. Go for a learner, someone who can learn things quickly and get shit done. The ideal candidate should have some understanding of the popular ETL tools & concept and should have worked on major DBs — Postgres, Mysql, Mongo, Cassandra to name a few. A sound understanding of big data systems like Hadoop and Spark would be an added advantage.
  2. A Business Analyst — We call them Type ‘A’ data scientists. This person would be responsible for coming up with business problems that can be solved with data. After deciding on the business problems, s/he should define a roadmap, determine the requirements and should be completely up to date with the business advances. The aim for this person should be to get as much business acumen as possible, right from the very start. Once the data kicks in, the person would evolve into one of the lead problem solvers for the business with the help of data. Following are the traits that you should be looking for while hiring for this role :
  • Inquisitive and structured thinking : They should be able to ask great questions. The question might be dumb but should be great. There is a difference between dumb and bad. No question is ever dumb, but it might just be a bad question. A good question is the one which is asked keeping in mind that the answer might give us some clarity around the business problem. Afterall, solving a business problem is all about representing it correctly. Answering the question should help in the representation of the business problem.
  • Great learner : The person should be a quick learner. Often we see great learners have varied interests. The ideal hire would be someone who has some interest in creative fields like art, dance, design, music etc and has taught themselves one of these. These are the people whose creative right hemisphere of the brain is active along with the logical and mathematical left.
  • Amazing storytellers: Throw 6–8 random objects and ask them to create a story around it. A good business analyst should be able to do this. In one our interviews, we asked the interviewee to create a story with a wine bottle, polar bear, stairs, house, torch and a rope. When you do this, always bring at least one unexpected element into the scenario. In the above example, having a polar bear in the mix ensures that the person has to think “out-of-the-box” to manufacture a story around it.
  • Tech skills: The person should be good with Excel and SQL to begin with, along with R and/or Python. But again don’t go for tech ‘experts’. 3 years back, SAS was dominating the market. In the last 3 years, the focus has shifted from SAS to R and now to Python. Very soon it could be Scala or Julia. Remember that data science is still in a very evolving and self-disrupting phase, so hire people who can learn at a fast pace.
  • BI Skills: Should have worked on creating dashboards from scratch, be it a VBA-based Excel dashboard, QlikSense, QlikView or Tableau or any other tool. More than the tech, the person should understand the business requirements clearly while creating the dashboard.

The silver lining phase

This is just after the launch, when you see a decent amount of data flowing in. This is the time you need to validate the hypothesis with the help of data. You should be ready with below additions to your team when you go live with the product :

  1. Data Scientist : We call them Type B data scientists. Their primary role is to create statistical and machine learning models to do predictive analysis, classifications, clustering, test and control, recommender systems etc., as the need may be. This guy can be the previous Type A DS as well, provided he has gathered enough knowledge to do this (again, the rate of learning is the factor here). The goal should be to reach a stage where data scientists can perform both the Type A and Type B tasks interchangeably. Apart from all the desired capabilities of Type A DS, he should have some experience in creating statistical & machine learning models and should be well acquainted with R or Python or Julia or SAS — any language is fine (even if it’s MATLAB or Octave), as long as it’s not being deployed in production. Languages are there to help a DS, not to be a barrier.
  2. A tech engineer (optional) who likes working with data products, in case your product requires data-driven decisions. You would need a tech engineer to create the necessary data products. A data scientist can create necessary statistical or machine learning models but you will need someone to optimise it and make it real-time by deploying it to production. While hiring this person, you should make sure that he understands data and its capabilities and is not just a developer.

All is well phase (not really!)

If you reach this phase, it means your product is doing well. You have enough data now that can’t be handled by a 5 member data team, so you need to expand.

  1. This is the time the data science team should evolve as an independent products team. You should have the required number of engineers, designers and data scientists in the team. We use the term “engineer” and “designer” to mark the distinction before being a part of the team, there should not be any distinction later. Once they are part of the data sciences team, they are all data scientists. At this stage, a big mistake that many companies make is to depend on the mainstream products team to meet the tech and design requirements. As a result, the data priorities never come first in the project pipeline. It’s very common that a data scientist comes up with a very interesting way to solve a business problem using data, but it gets ignored, because it’s not in the “project pipeline”. This is also the time when the data engineers should start focusing on creating tailor made solutions specific for the business use case and open-sourcing it later.
  2. As for hiring more data scientists, the scaling should happen in an interdisciplinary space and shouldn’t be confined to the space of data sciences or statistics. This is the time to hire people with deep knowledge in various disciplines that deal with data. You should consider people from varied backgrounds like psychology, experimental physics, applied mathematics, neurology, DNA sequencing etc. These are the fields making really awesome progress in analysing and making sense of data. Bring motivated and knowledgeable folks from such backgrounds together and you will have one kickass data team.

Making sense of data happens when science meets art. It has got nothing to do with programming in R or Python or Julia. That’s a skill which can be learnt. The prime criteria for hiring a good analyst is to check their inquisitiveness and learning rate. A good analyst (Type A DS) makes an awesome data scientist (Type B DS).

This post was written by Raj Vardhan, our data sciences lead at Simpl.

--

--