How to ace your first hackathon — Tutorial in Python — II

Simple hacks to accelerate to the top of hackathon leaderboard

Hello! In the previous article, we learnt how to transform data, generate new features, and build a machine learning model from scratch — to get ourselves on that leaderboard. The mere thought of getting on the leaderboard sounds so exciting, but that must not be our end goal. We must strive for the victory.

If you haven’t read my previous article on ‘How to ace your first hackathon’, I’d strongly recommend reading that before you jump into this one.

In this article, let's go through some simple hacks and techniques that will help us advance on the leaderboard. Simply put, let’s predict better!

Some philosophy before we get started!
Problems are inevitable. We will come across many of them in our lifetime. What matters is how we respond to them. What we choose to do about them. And as I often quote, what seems impossible now, will slowly become possible and eventually be effortless.

So, let's go through the training dataset again and try to get some interesting insights out of it. (link)

Screenshot of the Training data (5961 rows)

Simple Hack 1: Qualification

If we dig deeper to understand the ‘qualification’ column, we’ll see that there’s one particular entry which is an anomaly/outlier in the dataset. The entry being — ‘Get inspired by remarkable stories of people like you

Interesting right? They’re all profiled as dermatologists and have a default fee of 100 rupees.

Screenshot of the filtered data for Qualification = Get inspired by remarkable stories of people like you

So, what we’ll do is leverage this to slightly modify our prediction. We will look at ‘Qualifications’ in the test data and re-write the ‘fee’ to be 100 for them, irrespective of what the predicted value was, for these entries.

Hardcode fee = 100 for all relevant entries

Simple Hack 2: Miscellaneous_Info

Upon navigating through the ‘Miscellaneous_Info’ column, we’ll see that there are many entries where the fee is already mentioned in the ‘Misc Info’ section. Let’s verify if they’re the actual fee or just some random number.

Filtered data

We see that the amount (in rupees) mentioned in ‘Miscellaneous_Info’ is a true representation of the actual fee amount. Hence, we can use this piece of information to overwrite the predicted amount by hardcoding their fee to be the number present in ‘Miscellaneous_Info’, if there’s any.

Caution!

Before executing this step, we must be careful about one thing. There’s a small exception to this rule. For most entries, the fee amount present in the ‘Miscellaneous_Info’ column is a true representation of the actual fees. However, if the fee is > 999 in ‘Miscellaneous_Info’ section, it’s real value is 100 (Default fees). Again, we can confirm this by looking at the data.

Filtered data where Misc fee > 999; Actual Fee = 100

So, let’s write a piece of code to do that. In Python, the following line of code can be used to get the ₹ symbol.

Code for the Rupee symbol
Extract fee from ‘Miscellaneous_Info’ column

After you have this, just follow the steps described above.

Step by Step Execution

Step 1: Run your ML model (Explained in Tutorial -1)

Step 2: For Qualification = Get inspired by remarkable….; Fee = 100

Step 3: Overwrite predicted fees from Misc_Info data; if present.

Step 4: If ‘Misc_Info’ fee > 999, then actual fee = 100

This way, you’ve put in your intelligence in the model too, apart from the prediction done by your Machine Learning algorithm. It’s a simple hack but small things like these will take you on top of the leaderboard.

Finally, I’d like to end by quoting Ronald Coase:

If you torture the data long enough, it’ll confess
Credits — Google