Data Science Study Roadmap — 2023 Update

Evgeniia Komarova
2 min readNov 25, 2023

--

Introduction

Learning data science can be both exciting and confusing. When I started, I struggled to find a clear, step-by-step guide. I wished for a roadmap that would tell me what I need to know, why it matters, and where to study it — all without complex terms.

Motivation

I wrote this article because, like many, I faced confusion at the start of my data science journey. The demand for data skills is growing, but the path to becoming a data scientist can be overwhelming. This guide aims to be a light in that darkness, providing a straightforward roadmap that explains not just what to learn, but why.

Key Tools for Data Science:

  1. Basic Foundations:
  • Math: Linear Algebra, Calculus, Probability, Statistics
  • Programming: Python, R, SQL (optional)
  • Data Manipulation: Numpy (Python), Pandas (Python), Dplyr (R)
  • Data Visualisation: Matplotlib (Python), Seaborn (Python), ggplot2(R), echarts4r (R)

2. Data Exploration and Preprocessing:

  • Exploration Data Analysis (EDA)
  • Feature Engineering
  • Data Cleaning
  • Handling Missing Data
  • Data Scaling and Normalisation

3. Machine Learning:

  • Supervised Learning: Regression (Linear, Polynomial), Classification (Logistic, k-Nearest Neighbours, SVM, Decision Trees, Random Forest)
  • Unsupervised Learning: Clustering, Dimensionality Reduction (PCA, t-SNE, LDA)
  • Reinforcement Learning
  • Model Evaluation and Validation: Cross-validation, Hyper parameter tuning, model selection
  • ML libraries and Frameworks: Scikit-learn (Python), Tensorflow (Python), Keras (Python), PyTorch (Python), Caret (R)

4. Deep Learning:

  • Neural Networks: Perceptron, Multi-Layer Perceptron
  • Convolutional Neural Networks (CNNs): Image Classification, Object Detection, Image Segmentation
  • Recurrent Neural Network (RNNs): Sequence-to-Sequence Models, Text Classification, Sentiment Analysis
  • Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU): Time Series Forecasting, Language Models
  • Generative Adversarial Networks (GANs): Image Synthesis, Style Transfer, Data Augmentation

5. Data Visualisation and Reporting:

  • Dashboard Tools: Tableau, Power BI, Dash (Python), Shiny (R)
  • Storytelling with Data
  • Effective Communication

6. Domain Knowledge and Soft Skills:

  • Industry Specific Knowledge
  • Problem Solving
  • Communication Skills
  • Time Management
  • Teamwork

Here is just an overview of the main disciplines that are required to become a data scientist. In future articles, I will delve into each section of this data science roadmap, offering in-depth insights into theory, practical applications, and detailed code implementations.

Stay tuned for a comprehensive guide to navigate the intricacies of data science. Your applause fuels my motivation — let’s continue this journey together. Follow for more, as we explore the data landscape step by step together 😊

--

--

Evgeniia Komarova

Productivity | Planning | Data Science | Nutrition Science