Can you predict who survived the Titanic?
I thought I couldn’t. At best, I could do a coin flip . (~50%)
But those were pre machine learning (well ackshually, supervised learning) days.
Apparently I can do predict with ~77% accuracy. Which is a lot better than flipping a coin.
What the hell am I talking about? Let me back up a bit. This is a rolling competition on Kaggle which I thought was a fantastic way of quickly applying machine learning to a problem set.
What is Kaggle?
Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
Here are the steps I followed:
- Followed all the tutorials in this fantastic DataCamp Tutorial. Before the title throws you off, I had never heard of Kaggle before I took this and the very bare bones understanding of Python. It is also free.
- Kept at the course. It took me about 4 hours to complete, but the trick was to complete it in one sitting for retained learning.
- At different times in the course, a couple of CSVs will be generated for you. These are result files — it’s part of the fun. Just save them.
- Created an account on Kaggle (again, free).
- Head to the competition page and join it.
- Submit your prediction. There’s a thrill to see the accuracy you get through the model that you wrote.
- My eventual top score was ~77% which was exhilarating.
What I struggled with:
- There is a lot of Googling involved even when following instructions in the course. Be open to searching beyond the course — it’s crucial as sometimes you’re left at a dead end without it.
- DataCamp can throw some false negatives when you perform the right operation but for whatever reason, it doesn’t accept your answer. Just move past it if you’re getting the right result but DataCamp refuses to agree. It is not important. The point is that you understood what you did.
- The DataCamp course itself doesn’t tell you anything about using Kaggle. I had to figure that out myself though it was fairly intuitive.
What’s next?
The same competition but I am going to go a little more in the weeds by following this code along on Facebook.
Stuff I read since my last post
- Algorithms to Live By: The Computer Science of Human Decisions (11%)
- Inspired: How to Create Tech Products Customers Love (87%)
- Machine learning — Is the emperor wearing clothes?
Unrelated and Unlabelled
Until next post!