One week of Machine Learning madness with HackerRank: Part 2
In case you’ve missed it, check out the first part here.
With the classification challenge behind me (at least code wise), all that was left was the recommender system challenge.
Before I start, let me say that prior to this competition I had no idea what a recommender system was. Actually, I did have some idea, as I had studied it briefly in Coursera’s Machine Learning course, but it was a far memory and even at the time it was the one concept that I wasn’t completely sure about.
Were it a Kaggle competition, with one or two months to go, it would have been fine, but with only four days left to wrap up, it seemed impossible to come up with anything above a baseline kind of submission knowing virtually nothing about the subject. After panicking a bit, I decided to carry on as I was already halfway through the contest anyway.
A recommender system is a model that tries to predict the preferences of a user towards a set of items. Recommender systems are part of our everyday life, they are embedded in social networks, online stores, search engines. In summary, in a whole lot of applications.
The challenge requires that one understands how HackerRank works, so I’ll start with that and just then move on to the actual problem.
HackerRank is a platform that hosts coding contests. These contests are organized by domain (eg. Artificial Intelligence) and subdomain (eg. Machine Learning). Each contest is composed of a set of challenges that must be solved by the user. Each challenge is considered solved when a user submits a solution that achieves a predefined threshold score. Usually, a user can submit as many times as he desires to a challenge. I thought about making a diagram, but came to the conclusion that it would be even more convoluted than the written explanation. Visiting HackerRank might make things clearer.
After that poor attempt at explaining HackerRank’s system, let’s delve into the challenge. The problem is to predict a set of challenges that a user will attempt and possibly solve. The evaluation formula is a little overcomplicated, but can be summarized as:
- Recommendations of challenges that a user tries and successfully solves are worth 3 points
- Recommendations of challenges that a user tries but does not solve are worth 1 point
In addition, recommendations must be part of a specific contest and challenges that a user already solved cannot be recommended. There is also a limit of 10 recommendations for each user.
Data Exploration and Preprocessing
There were two datasets available for this competition. The first one contained information about the users’ submissions, like whether the score of the submission solved the challenge or not, which contest it was part of, what programming language was used, and some other things, as can be seen in the table below:
The second dataset available had information about the challenges, including features about the challenge itself, like domain, subdomain, difficulty, and some entry stats, like number of submissions, solved and unsolved.
Regarding how to approach the problem, as I discovered in my quick research, recommender systems are designed using one (or both) of two methods:
- Collaborative filtering: Users are grouped by similarity of interests. In the context of the problem, if users A and B solve challenge X and user A also solves challenge Y, then user B is more likely to solve challenge Y than a random challenge.
- Content-based filtering: Items, in this case challenges, are grouped by similarity of characteristics. If challenges X and Y are both part of the same subdomain and user A solves challenge X, then he is more likely to solve challenge Y than a random challenge.
To apply any of the above, the data needs some “massaging” first. Beginning with the submissions dataset, we need first to convert it into a user-challenge matrix, where each cell holds the number of times a given user has submitted to a specific challenge. This serves as an interaction strength, which is important for a collaborative filtering approach.
There were some issues with this solution however. First, the matrix had a very high sparsity (99.3%) due to the fact that a user can have many entries for the same challenge but is unlikely to try many different challenges. Second, we cannot recommend a challenge that the user has already solved, which is the case for many of the submission entries, therefore the collaborative filter solution is very limited.
Regarding the challenges dataset, the main issue was the missing values. Many of the domains and subdomains were missing. The good news is that all of the data missing were from challenges that were not listed under the target contest and many of them were duplicates. So all I had to do was merge the numerical data (difficulty, number of submissions) and remove the duplicates that had missing values. Some of the challenges however did not have duplicates, and for these I used a slightly more complicated method that I describe next.
To impute the remaining values, I resorted to the submissions dataset once again. One of its features is the programming language used for the submission. My hypothesis was that the programming language used is highly dependent on the domain/subdomain of the challenge, and this can be interpreted as P(language|domain) or P(domain|language) if we look the other way around.
“Model Training” and Results
It would be a Machine Learning sacrilege to say that I “trained” a model. What I did was just a somewhat smart use of the features available to make a baseline kind of model.
As stated in the beginning, I didn’t know how to develop a recommender system and time was short, so I decided to code a very simple ranking model and went with the first set of reasonable hypotheses that I could think of, which are:
- A user is likely to retry a challenge. This is very obvious from what we see in the submissions dataset, so this is a pretty safe bet. The problems are that there aren’t many unsolved challenges in the submissions dataset and this hypothesis does not take into account the likelihood of solving a challenge.
- A user is likely to try challenges from the same subdomains that he already tried. I assumed that a user has a narrow set of interests and thus would try mostly challenges from the same subdomains. Unlike the first hypothesis, this one had many results, so I ranked the recommendations by number of solved submissions. This serves as an aggregate of popularity and solution likelihood.
- A user is likely to try challenges from the same domains that he already tried. Same as above, just with a wider hypothesis field.
- If you don’t know anything about the user, just recommend the most popular challenges.
With these set of hypotheses, all that was left to do was create the code to extract the information required and reorder the results accordingly. To my surprise, this solution actually did well, with a final score of 0.28, placed 26/410!
This competition was one hell of a ride! Far from being as complex as Kaggle competitions, but not trivial that one can just blaze through it in a day, it was a very demanding experience overall. With a tight one week time frame, I doubt I will enter in another one any time soon. I came to the realization that I prefer working on ML problems in a more self-paced speed.
Regarding the competition itself, the challenges were well designed, but some major flaws in the leaderboards ended up hindering my experience. First, the scoring scales were changed midway without any kind of warning, which led to confusion. On top of that, the metric for the recommender challenge had a critical exploit. It actually scored duplicate recommendations, so you could just find a handful of solved submissions and replicate these to improve your score. This was fixed only in the last couple of days before the end, resulting in a large shift in the leaderboard.
This is it, hope you’ve enjoyed this short series. Till the next Machine Learning competition!