Doing a Kaggle competition from start to end

Muriithi_Kabogo
Nestmetric
Published in
4 min readDec 1, 2018

Doing a Kaggle competition for a newbie to machine learning might seem like an uphill task. Sometimes even demotivating because you just do not know how to do it. This was me 3 weeks before writing this blog. My title is my keywords search on google the first time I wanted to do a Kaggle competition. Participating in this competition has made me learn fast.

You can get the code on GitHub.

Let us get right into it.

First things first, Find a competition to participate in. I went to https://www.kaggle.com/competitions and chose Titanic: Machine learning for Disaster.

Fire up your Jupyter notebook.

I am currently using Amazon SageMaker to do all my machine learning experiments. To get started you can find a very easy tutorial to follow on https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html .

Use the Kaggle API to get data

I used the command line to get data. I found it easier. To access the Kaggle API using the command line follow the steps on https://github.com/Kaggle/kaggle-api .

When you get the training and test data, you are good to start.

Import the necessary libraries to help you with the Titanic classification problem. I am using the fast.ai libraries.

Load the training data using pandas.

Change non-numeric columns to categorical variables.

Check for missing values.

Change all the categorical variables into their equivalent integers and replace missing values.

Define your x and y variables

Run your training data through a Random forest classifier and get the score.

Once your model is trained, go ahead and load your test data on which you are going to use your trained model to give the predictions that you will submit to Kaggle.

Change non-numeric columns to categorical variables.

Check for missing values.

Change all the categorical variables into their equivalent integers and replace missing values.

Run your test data through a Random forest model that you already trained and get your predictions.

Create a data frame consisting of columns that are stipulated on the evaluation tab under the Submission file format.

Turn the data frame into a CSV file and save it in a folder.

Download and submit your predictions on Kaggle.

THAT'S IT!!

In case you have any bugs, feel free to ask, I will respond ASAP.

Happy Coding :-)

Resources:

Complete Beginner: Your First Titanic Submission

Fastdotai library

Implementing a random forest classification model in python

--

--