New Series: The Full Stack Data Scientist
What a data scientist should know to build end-to-end data science solutions
Stack Overflow recently released their 2019 developer survey. It was full of interesting developer insights into everything from preferred technologies to optimism of the future. It made me think about the role of data science in technology and the skills required to have the role integrated into the wider ecosystem. Developers have coined the term ‘full stack’ for a developer who is comfortable working on all aspects of web development. What would be the equivalent for data science?
Most respondents (51.9%) identify their roles as ‘full-stack developers’, with ‘data scientist or machine learning specialist’ taking up 7.9% of responses. Other data-related roles include data or business analyst (7.7%), data engineer (7.2%) and scientist (4.4%).
Since many data scientists don’t have the luxury of the support of large teams of developers, they must be able to build things and perform tasks that aren’t traditionally thought of as part of their role. This could relate to business analysis, data engineering, DevOps, database management and web development. I would consider a data scientist who is capable in all these areas to be a full stack data scientist. It’s not an option in the survey, yet… :)
The ability to build end-to-end solutions is the best way to prepare yourself for any role or project, work with a variety of teams, and ensure your insights bring value to the business. I believe that in order to do this, you must have a good knowledge in each of these areas:
💼 Business analysis. A sound understanding of the requirements, available data and goals of a project.
🏛 Infrastructure. The ability to efficiently design, deploy and work with a wide range of technologies and data management systems.
🚂 ETL. Data scientists should be able to build effective data processing pipelines so that their models and analysis are easily maintained.
💡Machine learning. Extensive knowledge of techniques to build intelligent systems.
🖥 DevOps. Source controlling, deploying and monitoring solutions is made easier using tools like Git, Docker and Airflow.
📱Web app & API development. Building simple web applications and API endpoints will make it easier to integrate insights into other applications.
📊Data visualisation. Create intuitive visualisations using a variety of tools.
The aim of this series is to cover each of these areas. If we are showcasing a particular tool, the post will walk through a Github repository.
The first part of the series is already live! Check out The Full Stack Data Scientist Part 1: Productionise Your Models with Django APIs.
What would you like to see next? Vote for the next post below!
Applied Data Science is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website.