Kaggle “Novice” Badge to “Contributor” in 7 Lines of Code

Alexander S. Augenstein
Analytics Vidhya
Published in
4 min readSep 28, 2020

Kaggle = Social Media for Data Science Nerds

Ever heard of Kaggle.com? No? That’s fine. I won’t hold it against you, but now that you have, you MUST check it out. This site is too huge not to get involved (assuming of course you have a coding aptitude and a conscience).

You’ll also quickly realize you should have been using this site a long time ago, so it’s critical you shake off that “Novice” badge next to your name! How, you ask? They provide a convenient checklist 📜.

First thing’s first, update your profile. Just pop in a short bio, your location, occupation, and organization (your school is fine if you’re a student). Verify your phone number, cast an upvote, write a comment, ✔✔✔ easy stuff!

Running a Notebook or script is a requirement buy may be slightly more tricky if you’ve never used Jupyter Notebooks before (but if you like R or Python, you’ll love these!). Go to the Notebooks tab:

Pick a notebook that interests you:

Click “Copy and Edit”:

This will take you to the code used to generate the analysis for the Notebook in question. Just run it (there’s a button)! Kaggle will count this as meeting the minimum requirement towards your next badge ✔.

Lastly, you have to MAKE A COMPETITION SUBMISSION. dun dun dunnn. This isn’t as difficult as it sounds, and we’re going to do it in 7 lines of code.

If you’ve spent any time on the site whatsoever, you’ll see that Kaggle pushes this Titanic competition, hard. It’s a game where they give you basic info about Titanic passengers, and it becomes our job to guess as accurately as possible whether they lived or died. Wow, that got dark fast. Sign up for the competition, then create a new Notebook (a script, more specifically).

Now, we’ll write our “code” — what we’ll do here is about as simple as it gets. Most people aboard the Titanic didn’t make it (31.6% 😢), so by submitting code that assumes nobody made it, we should expect 68% accuracy, within variance.

Here’s my initial submission, written in Python: https://www.kaggle.com/alexaugenstein/asa55-titanic-competition?scriptVersionId=43567606

Or you can just copy and paste the secret sauce below:

import numpy as np
import pandas as pd

data_test = pd.read_csv('/kaggle/input/titanic/test.csv')
data_submission = pd.DataFrame()data_submission['PassengerId'] = data_test['PassengerId']
data_submission['Survived'] = pd.Series(np.zeros(data_submission.size).astype(int))
data_submission.to_csv('/kaggle/working/titanic_submission.csv', index = False)

This code prepares a Pandas DataFrame object with two columns: the first contains passenger ID’s, provided by Kaggle, the next contains our guess as to whether or not that passenger survived (hence the 0’s 😢)

As a Kaggle member, you have some storage space available to you out in the cloud somewhere. The code provided lands it in your Kaggle directory called /kaggle/working/titanic_submission.csv.

Kaggle of course knows this file exists, and associated the output with your active competition. To make the submission, go to Notebooks → Your Work → [whatever you named your Titanic competition submission] and scroll down until you see the data we generated:

Click submit.

The accuracy will be 62.2%. Could be better, but hey, you’ll see on your profile that you’re a certified Contributor now 🎉🎈 🎇! Of course there’s a whole lot more you can do on Kaggle, this only barely scratches the surface, but you’re already well on your way now you know what it takes to contribute!

--

--