Step-by-step Data Science for Developers
DevC Jeddah Workshop on Python
The votes are in! The Jedis of the Developer Circle in Jeddah are eager to explore the trending field of Data Science. Off we go to the planning of a workshop for this month… From past events, we learned that:
- Hands-on workshops > Technical talks
- Understanding of fundamentals > long-winded hand-holding
- Self-contained dev env > hair-pulling setups
Based on these observations, here’s a set of design requirements for a palatable first data science experience:
- Interesting small problem: the participants should feel intrigued at the source material they deal with, from start to end.
- Concepts on-the-go: they should be able to grasp concepts bite-size while traversing the data science workflow.
- Minimal pain: they should be teleported to the data science work environment in seconds, skipping the emotional roller-coaster that is common before typing the first
import
statement.
In this workshop, we will explore the famous (or rather infamous) Titanic Survival Dataset in a browser environment through a Kaggle Kernel. The workshop has 2 parts:
Gain insights from Exploratory Data Analysis (EDA) [30 minutes]
We will explore the data structure and distributions of the observations using graphical and quantitative methods.
- What is Data Science?
- What is the dataset and our goal?
- What can we learn through visual and quantitative explorations?
Build a Predictive Model using Machine Learning (ML) techniques [20 minutes]
We will then prepare the data (preprocessing), build a model, and evaluate it.
- How to get the data ready for model building?
- Which models are we going to use?
- How do we define success?
Bonus: Submit to a Kaggle Competition (10 minutes)
Now it’s time to see how well our predictions fare in a machine learning competition. Using the kernel interface on Kaggle makes submitting super simple. With an output file generated (as .csv in the format suggested by the competition), you just need to hit submit and see your scoring!
TODO
- Make a sample submission on the kernel
- Showcase model comparisons
- Showcase cross-validation