“Two people in elegant shirts brainstorming over a sheet of paper near two laptops” by Helloquence on Unsplash

Step-by-step Data Science for Developers

Andrew Yip
DevCJeddah
Published in
2 min readFeb 18, 2018

--

DevC Jeddah Workshop on Python

The votes are in! The Jedis of the Developer Circle in Jeddah are eager to explore the trending field of Data Science. Off we go to the planning of a workshop for this month… From past events, we learned that:

  1. Hands-on workshops > Technical talks
  2. Understanding of fundamentals > long-winded hand-holding
  3. Self-contained dev env > hair-pulling setups

Based on these observations, here’s a set of design requirements for a palatable first data science experience:

  1. Interesting small problem: the participants should feel intrigued at the source material they deal with, from start to end.
  2. Concepts on-the-go: they should be able to grasp concepts bite-size while traversing the data science workflow.
  3. Minimal pain: they should be teleported to the data science work environment in seconds, skipping the emotional roller-coaster that is common before typing the first import statement.

In this workshop, we will explore the famous (or rather infamous) Titanic Survival Dataset in a browser environment through a Kaggle Kernel. The workshop has 2 parts:

Gain insights from Exploratory Data Analysis (EDA) [30 minutes]

We will explore the data structure and distributions of the observations using graphical and quantitative methods.

  • What is Data Science?
  • What is the dataset and our goal?
  • What can we learn through visual and quantitative explorations?

Build a Predictive Model using Machine Learning (ML) techniques [20 minutes]

We will then prepare the data (preprocessing), build a model, and evaluate it.

  • How to get the data ready for model building?
  • Which models are we going to use?
  • How do we define success?

Bonus: Submit to a Kaggle Competition (10 minutes)

Now it’s time to see how well our predictions fare in a machine learning competition. Using the kernel interface on Kaggle makes submitting super simple. With an output file generated (as .csv in the format suggested by the competition), you just need to hit submit and see your scoring!

TODO

  1. Make a sample submission on the kernel
  2. Showcase model comparisons
  3. Showcase cross-validation

Link to all-in-one kernel (if you like it please upvote ;) )

Additional Resources

--

--

Andrew Yip
DevCJeddah

phd student @kaust_news | data science enthusiast | tech community builder