“Two people in elegant shirts brainstorming over a sheet of paper near two laptops” by Helloquence on Unsplash

Step-by-step Data Science for Developers

Published in

DevCJeddah

2 min readFeb 18, 2018

DevC Jeddah Workshop on Python

The votes are in! The Jedis of the Developer Circle in Jeddah are eager to explore the trending field of Data Science. Off we go to the planning of a workshop for this month… From past events, we learned that:

Hands-on workshops > Technical talks
Understanding of fundamentals > long-winded hand-holding
Self-contained dev env > hair-pulling setups

Based on these observations, here’s a set of design requirements for a palatable first data science experience:

Interesting small problem: the participants should feel intrigued at the source material they deal with, from start to end.
Concepts on-the-go: they should be able to grasp concepts bite-size while traversing the data science workflow.
Minimal pain: they should be teleported to the data science work environment in seconds, skipping the emotional roller-coaster that is common before typing the first import statement.

In this workshop, we will explore the famous (or rather infamous) Titanic Survival Dataset in a browser environment through a Kaggle Kernel. The workshop has 2 parts:

Gain insights from Exploratory Data Analysis (EDA) [30 minutes]

We will explore the data structure and distributions of the observations using graphical and quantitative methods.

What is Data Science?
What is the dataset and our goal?
What can we learn through visual and quantitative explorations?

Build a Predictive Model using Machine Learning (ML) techniques [20 minutes]

We will then prepare the data (preprocessing), build a model, and evaluate it.

How to get the data ready for model building?
Which models are we going to use?
How do we define success?

Bonus: Submit to a Kaggle Competition (10 minutes)

Now it’s time to see how well our predictions fare in a machine learning competition. Using the kernel interface on Kaggle makes submitting super simple. With an output file generated (as .csv in the format suggested by the competition), you just need to hit submit and see your scoring!

TODO

Make a sample submission on the kernel
Showcase model comparisons
Showcase cross-validation

Link to all-in-one kernel (if you like it please upvote ;) )

Step-by-step on the Sloping Deck

Using data from Titanic: Machine Learning from Disaster

www.kaggle.com

Additional Resources

Learn Data Science - Infographic

After being dubbed by Harvard Business Review as "sexiest job of the 21st Century" in 2012, Glassdoor named it "the…

www.datacamp.com

What is Data Science?

Data science has hit all sectors of industry and academia. From business, education, health care, scientific sector to…

alldatascience.com

Learn R, Python & Data Science Online | DataCamp

Learn Data Science from the comfort of your browser, at your own pace with DataCamp's video tutorials & coding…

datacamp.com