How to ace your first hackathon — Tutorial in Python — II
Simple hacks to accelerate to the top of hackathon leaderboard
Hello! In the previous article, we learnt how to transform data, generate new features, and build a machine learning model from scratch — to get ourselves on that leaderboard. The mere thought of getting on the leaderboard sounds so exciting, but that must not be our end goal. We must strive for the victory.
If you haven’t read my previous article on ‘How to ace your first hackathon’, I’d strongly recommend reading that before you jump into this one.
Machine Hack — Doctor’s Fee Prediction Challengemedium.com
In this article, let's go through some simple hacks and techniques that will help us advance on the leaderboard. Simply put, let’s predict better!
Some philosophy before we get started!
Problems are inevitable. We will come across many of them in our lifetime. What matters is how we respond to them. What we choose to do about them. And as I often quote, what seems impossible now, will slowly become possible and eventually be effortless.
So, let's go through the training dataset again and try to get some interesting insights out of it. (link)
Simple Hack 1: Qualification
If we dig deeper to understand the ‘qualification’ column, we’ll see that there’s one particular entry which is an anomaly/outlier in the dataset. The entry being — ‘Get inspired by remarkable stories of people like you’
Interesting right? They’re all profiled as dermatologists and have a default fee of 100 rupees.
So, what we’ll do is leverage this to slightly modify our prediction. We will look at ‘Qualifications’ in the test data and re-write the ‘fee’ to be 100 for them, irrespective of what the predicted value was, for these entries.
Simple Hack 2: Miscellaneous_Info
Upon navigating through the ‘Miscellaneous_Info’ column, we’ll see that there are many entries where the fee is already mentioned in the ‘Misc Info’ section. Let’s verify if they’re the actual fee or just some random number.
We see that the amount (in rupees) mentioned in ‘Miscellaneous_Info’ is a true representation of the actual fee amount. Hence, we can use this piece of information to overwrite the predicted amount by hardcoding their fee to be the number present in ‘Miscellaneous_Info’, if there’s any.
Before executing this step, we must be careful about one thing. There’s a small exception to this rule. For most entries, the fee amount present in the ‘Miscellaneous_Info’ column is a true representation of the actual fees. However, if the fee is > 999 in ‘Miscellaneous_Info’ section, it’s real value is 100 (Default fees). Again, we can confirm this by looking at the data.
So, let’s write a piece of code to do that. In Python, the following line of code can be used to get the ₹ symbol.
After you have this, just follow the steps described above.
Step by Step Execution
Step 1: Run your ML model (Explained in Tutorial -1)
Step 2: For Qualification = Get inspired by remarkable….; Fee = 100
Step 3: Overwrite predicted fees from Misc_Info data; if present.
Step 4: If ‘Misc_Info’ fee > 999, then actual fee = 100
This way, you’ve put in your intelligence in the model too, apart from the prediction done by your Machine Learning algorithm. It’s a simple hack but small things like these will take you on top of the leaderboard.
Finally, I’d like to end by quoting Ronald Coase:
If you torture the data long enough, it’ll confess