In part 1 of this two-part series, we learnt how to import and prepare data for a recommendation engine. In this article, we’ll look at:
- How the data is used and handled
- How a recommendation is produced
- How you can use that recommendation in your project.
As before you can refer to my project GitHub page Zero to Hero and code along.
Let's get started…
- Import the necessary library modules.
Surprise — Simple Python RecommendatIon System Engine— documentation here, offers a wide variety of prediction algorithms. I experimented with SVD, and the variety of Neighborhood algorithms offered to find that a Basic-KNN performs best for my project. Use a grid-search/random search to fine-tune your hyperparameters and use cross-validation with RMSE, MAE, or Accuracy to evaluate.
2. Reading in the data.
First, we instantiate a Reader(). Passing in rating_scale as an argument with 1, 5 representing, you guessed it… a rating scale of 1 to 5. This will allow surprise to make sense of our data.
Assign a Surprise-Dataset to a variable ‘train_data’ passing in the columns we want to use and calling upon our reader created above.
Lastly here we use a train_test_split to get training and test sets with the usual 25/75 split.
3. Its time to choose a model and fit it to our trainset.
You have a number of options for which model to choose in this Baseline K-Nearest Neighbour model I have chosen the parameters Cosine similarity (sim_options) and for the distances to calculated for the items (heroes in this case).
A quick note about User-based vs Item Based.
Choosing `user_based = True` would calculate the cosine distances for each user which can take a long time if you have many users. My data contained 60k Users so I experimented with 20k, 40k, and 60k all of which took over 40 mins to compute if they finished at all. Using less than 15k users provided me with a relatively quick result however compared with `User_based = False` (Item to Item similarity) calculation there was not a substantial increase in model performance to warrant the longer model computation time.
4. Getting the recommendation to the target audience
One way to provide a recommendation as a service is to code a rating system in which users are provided a sample of the items, giving them ratings. This allows you to overcome the cold start problem if a user has not rated anything ever before. Once they have rated a selection of sample items they can be given a recommendation.
Below is a snippet from my user rating function.
Using the input() method I ask users to rate a sample of five heroes.
The samples come from this snippet of code which rests in the same function snipped above. Note the lines of code under the comment Obtain a random data frame entry.
We’ve reached the final stretch.
5. Now that you have new ratings, you can use them to make predictions for this new user.
Making a prediction:
algo.predict(user, item, clip=False)))
Where algo = Algorithm variable name. user is a user id and item is an item id. Clip tells surprise whether to fit the predicted rating into the scale provided to the reader. The  is indexing the third element of the prediction output which is the actual rating.
The process from here is straightforward:
- Add the new ratings to the original rating DataFrame.
- Read into a surprise dataset.
- Train a new model using the updated DataFrame.
- Make predictions for the user and order those predictions from highest rated to lowest rated, you can return the top n recommendations with the text name of the item.
…and that's how to build a simple recommendation system with Sci-kit Surprise!