Building An Internship Recommendation System — I (Introduction)

Ishan Nangia
5 min readMar 17, 2019

--

http://www.dataastronaut.com/challenges-and-limitations-in-deploying-recommendation-systems/

Welcome to my mini-series of making a recommendation system for recommending internships. This is also the capstone project for my Data Science Nanodegree. I have made posts for the following topics to explain the entire project:

  1. Introduction
  2. Scraping
  3. Cleaning
  4. Recommendations

This is the 1st post that covers how I began, the problem definition and what what approaches I considered. Just a basic overview of the entire project.

Also, here is the github repository for this project: https://github.com/beatboxerish/Internship-Recommendation-System

Problem Definition

After completing all the projects of the various modules in my DSND(data science nanodegree), I had only the capstone project left to be completed in order to get the degree. I was given a lot of options for the capstone project including a project on data provided by Starbucks, an unsupervised learning project on messy data provided by Arvato Bertelsman and a CNN application project among some other really cool options. However, I was also given the choice of doing something entirely on my own and that was something that made me really excited as I would get to build something right from the start to the end.

I already had an idea about using BeautifulSoup to scrape data in python and I was looking for summer internships at that time which led me to think that why not use both of these for my project ? As I was interested in recommendation systems from the start, I got the idea of making one by scraping data from an internship listing site.

As it is important to define the problem one is working on right from the start, in order to not lose sight and get off-track, I defined it as:

Making an Internship recommendation system where the data will be scraped off from an internship portal/website and the end deliverable will be a command line app/interface

I had already used Argparse which is a python library that makes command line interface possible. I had also thought about making a web app using flask and bootstrap but then that idea later turned out to be not possible within the time I had.

Potential Models

Now came the time when I had to consider how I could build the recommendation system and what all approaches could be used as the approach I would choose would define what data I collected and how I prepared the data. The following are the main types of recommendation systems:

Different Types of Recommendation Systems
  1. Knowledge based: These type of recommendation systems take in information provided by the user and use their preferences to suggest items. These are very simple to implement but rely on the user inputting his choices. Eg: Selecting filters for price and item category on Ebay or selecting music genre on Spotify.
Knowledge based recommendations

2. Collaborative Filtering based: These models find which user is using which item and then find user-user or item-item pairs that are really similar to recommend similar items to users. These models use information on how users and items interact(eg: ratings given by users to items) to compute a metric(eg: correlation or euclidean distance) to understand similarity between users/items.

Collaborative and Content based recommendations

3. Content-based: These models use the information about an item or a user to find similarity between different user or item pairs. Then they recommend a user similar items. The main difference between collaborative filtering and content based models are that collaborative filtering needs users to have interacted with items but content based models don’t. They need some background information on the users or the items.

Going through these 3 categories and considering the information that would be available,

I decided to go with a hybrid system that would use knowledge based and content based recommendations

As I could not possibly get to know how users interacted with different internships, I could not have used collaborative filtering. Also by using the information I would get about different internships, I could use NLP to convert those details into matrices and find the similarity between different internships.

Evaluation Metric

To evaluate the recommendations made, I had to decide in advance what type of a metric I would be choosing. This was a huge issue for me as I had no data regarding the interaction of users with an internship. Nor could I possibly get my hands on it. Thus even if I wanted to evaluate my recommendations using an offline method, I couldn’t.

So I decided to evaluate my recommendations by eye, i.e, by seeing how relevant and how serendipitous the recommendations were

This way I could improve on my model till the time I would get a model I was satisfied with.

However, 2 possible ways to judge the performance of the recommendation model, if I were to deploy it online, would be as follows:

  1. Deploy an A/B test where the control group is recommended internships on random and the treatment group is made recommendations using the model. Then compare the number of clicks on the recommended internships for both the groups.
  2. Create a pop-up when a user clicks on a recommendation that allows the user to rate on a scale of 5 how relevant the recommendation was.

Outline of the Project

I wanted to show the flow of the project with summaries of the steps to give a high level overview of the project:

  1. Collecting Data: Data was to be collected through scraping using python’s BeautifulSoup library. The website data was to be scraped from https://www.letsintern.com.
  2. Cleaning Data: Data collected was to be cleaned and prepared to be used for making a recommendation model. This would also include preparing it for some basic data analysis.
  3. Analyzing Data: The cleaned data was to be used to answer some questions that would provide an interesting understanding of internship trends. ( There is no post for this though an entire notebook has been dedicated to this is present in my github repo for this project)
  4. Making Recommendations: A recommendation model was to be made and improved so that the internships recommended would not only be relevant to the internship that a user looked at, but also unique and serendipitous. A command line app was also to be made to allow easy usage of the scripts.

Well, that’s all for the introduction part. Hope you enjoyed reading this. Links to the other parts are given below:

--

--