What is needed to build a data science team from the ground up?

Original post published on 3Blades — Build and Share Your Data Science Projects

What specific roles would a data science team need to have to be successful? Some will depend on the organization’s objectives, but there’s a consensus that the following positions are key.

  1. Data scientist. This role should be held by someone who can work on large datasets (on Hadoop/Spark) with machine learning algorithms, who can also create predictive models, interpret and explain model behavior layman terms. This position requires excellent knowledge of SQL and understanding of at least one programming language for predictive data analysis like R and/ Python.
  2. Data engineer / Data software developer. Requires great knowledge of distributed programming, including infrastructure and architecture. The person hired for this position should be very comfortable with installation of distributed programming frameworks like Hadoop MapReduce/Spark clusters, should be able to code in more than one programming language like Scala/Python/Java, and knows Unix scripting and SQL. This role can also evolve into one of the two specialized roles:
  3. Data solutions architect. Basically a data engineer with an ample range of experience across several technologies and who has great understanding of service-oriented architecture concepts and web applications.
  4. Data platform administrator. This position requires extensive experience managing clusters including production environments and good knowledge of cloud computing.
  5. Designer. This position should be occupied by an expert who has deep knowledge of user experience (UX) and interface design, primarily for web and mobile applications, as well as knowledge of data visualization and ideally some UI coding expertise.
  6. Product manager. This is an optional role required only for teams focused on building data products. This person will be defining the product vision, translating business problems into user stories, and focusing on helping the development team build data products based on the user stories.

Did you find the article interesting or useful? Please do not forget to visit our blog and share it!

3Blades offers a pre-built Jupyter Notebook image already configured with PySpark, Python, Julia, R and more. Get early access here!