Build a recommendation engine in 10 minutes using Recombee in Python
We’ll build a recommendation engine based on Coursera dataset to recommend courses to user by using a third party service called recombee.
Recombee is a platform that makes it easy for developers to create recommendation engine within minutes. Recombee has a “forever free” mode after the 30 days trial is over, unless you exceed the limits (20K monthly active users, 100K monthly recomms) so it’s a great way to integrate in your business as you can implement it in so many programming languages.
What is a recommendation engine?
A picture speaks a thousand words:
Recommendation engine in its simplest form of understanding is what you see in the picture above and for this getting started guide it’s enough.
Customers can be similar when they are from same city, or go to same school or are in the same age bracket or there can be many other factors.
Products can be similar when they have similar properties. Like a purse can be of aa lot of types and when a person is looking for bags online it’s better to provide them with recommendation of bags in all price ranges and then adjust the results based on what user is viewing from those recommendations.
How does recombee works?
You can fit the above model into any situation. Let’s say for example Netflix recommendation engine or youtube recommending videos. Every scenario have users and items at the end and in between there are different ways of interacting with the items. By default the interactions types provided by Recombee are:
- Detail Views (User has viewed details of an item)
- Cart Addition (User has added an item in the cart)
- Rating (User has rated an item)
- Bookmarks (User has saved/bookmarked an item)
- Purchases (User has purchased an item)
You can choose which interaction you want to add based on the scenario you are following. You’ll understand it more as we a staart coding.
Getting Started with Recombee
Goto https://admin.recombee.com/ and create a new account. Once you have successfully done that you will see a similar screen.
Install Recombee to your machine. Open terminal and type:
$ pip install recombee-api-client
Open python editor. I’m using Jupyter Notebook but you can use any editor where you can run python code line by line.
We’ll start by importing the required libraries:
from recombee_api_client.api_client import RecombeeClient
from recombee_api_client.api_requests import *
import random
import pandas as pd
Next we’ll create a Recombee client:
client = RecombeeClient('ADD_YOUR_API_IDENTIFIER', 'ADD_YOUR_PRIVATE_TOKEN')
You can find both of these by going into settings:
Dataset
The dataset I’m using is a dataset of coursera data and another dataset that have some user data init. Download both files in the folder link given below.
Coursera dataset is taken from: https://www.kaggle.com/siddharthm1698/coursera-course-dataset
So, yes we’ll be making a recommendation engine that’ll recommend courses to users. One file is a coursea_data.csv
and another is cleaned_user_data.xlsx
just so that you can see how we can work with different formats.
Let’s load the datasets. Put both files in the same folder where your python file is otherwise update the path to the file accordingly in the code below:
course_df = pd.read_csv("coursea_data.csv", index_col=0)
user_df = pd.read_excel (r'user.xlsx')
If you select Items in your Recombee Dashboard you’ll see that there are np properties.
You can add properties from dashboard but we’ll add properties from our python code.
client.send(AddItemProperty('course_title', 'string'))
client.send(AddItemProperty('course_organization', 'string'))
client.send(AddItemProperty('course_certificate_type', 'string'))
client.send(AddItemProperty('course_rating', 'double'))
client.send(AddItemProperty('course_difficulty', 'string'))
client.send(AddItemProperty('course_students_enrolled', 'string'))
If the above code works successfully you’ll get an ok
in return. Now, if you go back to items and refresh the page you can see:
Notice we didn’t add itemId
property but recombee added it automatically. Its a default by recombee as we’ll be using it as a reference of an item when we’re adding interaction.
So you have added properties of items that you have. Now we’ll add our users:
client.send(AddUserProperty('citizenship', 'string'))
client.send(AddUserProperty('email', 'string'))
client.send(AddUserProperty('full_name', 'string'))
client.send(AddUserProperty('gender', 'string'))
Same as items recombee added userId
itself and we’ll use this to define which userId
interacted with which itemId
.
Now we’ll populate the items using our course_df
requests = [SetItemValues(
course_df.index[i], #itemId
#values:
{
"course_title": course_df['course_title'][i],
"course_organization": course_df['course_organization'][i],
"course_certificate_type": course_df['course_Certificate_type'][i],
"course_rating": course_df['course_rating'][i],
"course_difficulty": course_df['course_difficulty'][i],
"course_students_enrolled": course_df['course_students_enrolled'][i]
},
cascade_create=True # Use cascadeCreate for creating item
# with given itemId if it doesn't exist
) for i in range(len(course_df))]# Send catalog to the recommender system
client.send(Batch(requests))
You’ll get an ok status for every entry that is being added in the recombee items. If you look at items you can see it’s populated:
Now we’ll populate users:
user_requests = [SetUserValues(
row['id'], #itemId
#values:
{
"citizenship": row['citizenship'],
"email": row['email'],
"full_name": row['first_name'],
"gender": row['gender']
},
cascade_create=True # Use cascadeCreate for creating item
# with given itemId if it doesn't exist
) for idx, row in user_df.iterrows()]# Send catalog to the recommender system
client.send(Batch(user_requests))
Same as items you’ll get ok status. The difference between this code and items code is first we’re using SetUserValues
instead of SetItemValues
and iteration (for-loop) is different as it’s an .xlsx
.
Refresh users page ad it’ll also be populated now:
Now, we have finished two aspects the third and final part before we can get recommendations is to add interactions. In a real application we can add these snippets which will run when a particular action is taken.
Let’s add some interactions. For demonstration purpose we’ll take five users. So, let’s pick user with id 11, 13, 14, 15, 17 (Just picked randomly from the above screenshot).
Let’s assume the following user{id} and item{id}:
user11 has viewed item0, item104, item107 and rated item110.
user13 has viewed item109, item110, item0, item103, item101, item104.
user14 has viewed item10, item11 and purchased item115.
user15 has viewed item113, item111, item 105, item102.
user17 has viewed item140, item142, item15, item151, item144 and purchased item15.
Above are our 5 user cases where we have fetched data from user actions on our website. Now how we’ll add these interactions to our recombee engine:
#user11 has viewed item0, item104, item107 and rated item110.client.send(AddDetailView('11','0', cascade_create=True))
client.send(AddDetailView('11','104', cascade_create=True))
client.send(AddDetailView('11','107', cascade_create=True))client.send(AddRating('11','110', 0.5, cascade_create=True))# Rating rescaled to interval [-1.0,1.0],
# where -1.0 means the worst rating possible, 0.0 means neutral, and 1.0 means absolutely positive rating.
# For example, in the case of 5-star evaluations, rating = (numStars-3)/2 formula may be used for the conversion.
# So here user rated 4/5 to a course (4-3)/2 = 0.5
#Also the rating here will have no effect on the rating of item110 that is available in the dataset.
And now the rest of the users:
#user13 has viewed item109, item110, item0, item103, item101, item104.
client.send(AddDetailView('13','109', cascade_create=True))
client.send(AddDetailView('13','110', cascade_create=True))
client.send(AddDetailView('13','0', cascade_create=True))
client.send(AddDetailView('13','103', cascade_create=True))
client.send(AddDetailView('13','101', cascade_create=True))
client.send(AddDetailView('13','104', cascade_create=True))#user14 has viewed item10, item11 and purchased item115.
client.send(AddDetailView('14','10', cascade_create=True))
client.send(AddDetailView('14','111', cascade_create=True))
client.send(AddPurchase('14','115', cascade_create=True))#user15 has viewed item113, item111, item 105, item102.
client.send(AddDetailView('15','113', cascade_create=True))
client.send(AddDetailView('15','111', cascade_create=True))
client.send(AddDetailView('15','105', cascade_create=True))
client.send(AddDetailView('15','102', cascade_create=True))#user17 has viewed item140, item142, item15, item151, item144 and purchased item15.
client.send(AddDetailView('17','140', cascade_create=True))
client.send(AddDetailView('17','142', cascade_create=True))
client.send(AddDetailView('17','15', cascade_create=True))
client.send(AddDetailView('17','151', cascade_create=True))
client.send(AddDetailView('17','144', cascade_create=True))
client.send(AddPurchase('17','15', cascade_create=True))
You’ll get ok status after these interactions are successfully published. Also we’ve added the interactions one by one we could have also add it to a excel sheet and upload them using for loop but the reason for this is give you the idea that in an application when an action is occured you’ll send a recombee client request to update it accordingly
Now if you go back to the dashboard you can see some numbers other than all blanks.
Now, let’s get recommendations of courses for a user. I’m taking recommendations for user11.
recommended = client.send(RecommendItemsToUser('11',5))
print(recommended)
So we got the following recommendations:
{'recommId': 'e30ae1d99a1dae19155a6643f1214d0c',
'recomms': [
{'id': '200'},
{'id': '807'},
{'id': '565'},
{'id': '871'},
{'id': '664'}
]}
Now in order to understand how recombee is recommending these I’ve created this image:
If you compare the interactions with the recommendations you’ll get some observations:
- The user only viewed beginner or intermediate level courses and recommendations are all within that.
- Rating of recommendation is between 4.6–4.8 based on the four interactions.
- Certificate type was also only course and specialization and we got recommendations based on that.
- The courses user viewed is in general 2 categories i.e. Python and Design the recommendations we got are in those categories.
Based on the above result I’m quite impressed as these are some great suggestions for user11 based on what user11 has been browsing on my website.
You can read more about recommend items to user here: https://docs.recombee.com/api.html#recommend-items-to-user
There are other types of interactions like recommending users for a user (users who have similar interests), recommend items for an item (items that have similar attributes) and users to item (user’s more likely to buy which items).
Conclusion:
Using recombee is very simple, they have structured the UX of their platform in a decent manner and are focusing on almost all the domains. The algorithms they are using behind their platform are some of the best ones used for recommendations by big companies like Facebook, Youtube and even Netflix.
Recombee depends on keywords. So, even if you add an essay about user recombee will use its knowledge base algorithms to extract information, so make sure you feed data that is more keywords oriented and this applies on items as well. Also users and items is something you can populate more but try to avoid add new properties too often and for interactions populate as much as you can. (my observations)