Today I want to do something exciting related to Data Science — one of the hottest topics in today’s technology world. There’s one thing in our daily lives, that I have benefited tremendously from, whether as serious as doing researches about my study materials, project ideas or coding algorithms, or as casual as searching for a song, a potential next car, or a movie. That thing is search recommendations, especially the Google and YouTube search recommendations. Today we are going to dive into the backend, get familiar with the algorithm that runs behind a recommendation engine, and build a simple movie recommender in Ruby.
At the very beginning, I tried to search for existing recommendation engine libraries (gems) in Ruby. There are a few, unfortunately they either require the use of Redis or a particular database such as Mongoid or Postgres. What I want is one using basic Ruby on Rails setup (with the default SQLite3) to demonstrate how the algorithm works without installing many other gems (in fact the only extra gem I use is faker, to generate different names for seeding data). After doing some research, I decided to build one myself. And it turns out that it works like a charm! I know the idea “building a recommendation engine” may sound super advanced and complicated, but trust me, to build a simple one, it’s not that hard. Now, let’s start by introducing one of the fundamental algorithms behind a recommendation engine, that is:
Collaborative filtering
What is collaborative filtering?
- Let’s say Person A likes Movie A, B, C
- There are other people who also like Movie A, B, C
- Those other people also like other movies that Person A doesn’t know/hasn’t watched
- Since Person A and other people share the same likes of Movie A, B, C, chances are they have a very similar taste in movies, so Person A may also want to know/watch those other movies other people liked
Now we have a better understanding on how recommendation engines work. To implement collaborative filtering in code, there are many ways, from basic to advanced. The algorithm we’ll use is:
- For every one other user, find the movies both him/her and Person A like
- Calculate the number of liked movies Person A and him/her share. The more movies Person A share with another user, the more weight we will put to his/her recommendation.
- The weight is calculated by dividing the total number of shared movies by the total number of all movies the other user liked. The reason we do in this way is that we can rule out the noise and edge cases, where a user may have simply liked all or almost all movies for no reason. If we don’t do in this way, should the above case happen, it will recommend everything with too much of a weight and pollute our score.
- Associate each movie with its own weight in a hash. When different users liked a same movie, we accumulate the weight for that movie. As you can see, the most recommended movie will have the highest weight (recommendation rating)!
- Sort the hash by the weights in descending order and we are done!
Excited? Let’s get started:
First, let’s use rails new movie-recommender
to create the basic and essential folders and files we need, then run rails g resource User name
, rails g model Movie name
, and rails g model LikedMovie user_id:integer movie_id:integer
. As you can see, our users table and movies table is a many to many relationship with liked_movies being the join table. A user having a particular number of movies simply can be interpreted as this user “liked” these movies. There is each a name column/attribute for User and Movie. Also We only need resource generator for users since our web application is for the users only. So for models we have:
class User < ApplicationRecord
has_many :liked_movies
has_many :movies, through: :liked_movies
endclass Movie < ApplicationRecord
has_many :liked_movies
has_many :users, through: :liked_movies
endclass LikedMovie < ApplicationRecord
belongs_to :user
belongs_to :movie
end
For migrations, we have:
create_table :users do |t|
t.string :name
endcreate_table :movies do |t|
t.string :name
endcreate_table :liked_movies do |t|
t.integer :user_id
t.integer :movie_id
end
Also we want our recommendation engine as a module, so let’s create a recommendation.rb
file in thelib
folder, and include it in our User model:
# /app/models/user.erbrequire ‘./lib/recommendation.rb’class User < ApplicationRecord
has_many :liked_movies
has_many :movies, through: :liked_movies include Recommendation
end
Now let’s start the official process of building a recommendation engine!
# /lib/recommendation.rbmodule Recommendation
def recommend_movies # recommend movies to a user # find all other users, equivalent to .where(‘id != ?’, self.id)
other_users = self.class.all.where.not(id: self.id) # instantiate a new hash, set default value for any keys to 0
recommended = Hash.new(0) # for each user of all other users
other_users.each do |user| # find the movies this user and another user both liked
common_movies = user.movies & self.movies # calculate the weight (recommendation rating)
weight = common_movies.size.to_f / user.movies.size # add the extra movies the other user liked
(user.movies — common_movies).each do |movie|
# put the movie along with the cumulative weight into hash
recommended[movie] += weight
end end # sort by weight in descending order
sorted_recommended = recommended.sort_by { |key, value| value }.reverse
end
end
So we have finished building the recommendation engine algorithm. Now for our simple web application, we need only index and show from users resources. Below is the code for index and show:
# /app/views/users/index.html.erb<ul><h2>Select a user:</h2>
<% @users.each do |user| %>
<li><%= link_to user.name, user %></li>
<% end %>
</ul>
# /app/views/users/show.html.erb<table><h2>Recommended Movies for <%= @user.name %>:</h2>
<tr>
<th>Movie</th>
<th>Rating</th>
</tr> <% @user.recommend_movies.each do |key, value| %>
<tr>
<td><%= key.name %></td>
<td><%= value.round(2) %></td>
</tr>
<% end %></table>
One last important step! Let’s seed our database:
# /db/seeds.rbrequire ‘faker’User.destroy_all
Movie.destroy_all# using faker gem to create unique names to create users
30.times { User.create(name: Faker::Name.unique.name) }# 15 movies
movies = [“Avengers: Infinity War”, “Star Wars: The Force Awakens”, “Avatar”, “Titanic”, “Jurassic World”, “Black Panther”, “Marvel’s The Avengers”, “Star Wars: The Last Jedi”, “The Dark Knight”, “Beauty and the Beast”, “Finding Dory”, “Pirates of the Caribbean: Dead Man’s Chest”, “Toy Story 3”, “Wonder Woman”, “Iron Man 3”]# create movies
i = 0
15.times do
Movie.create(name: movies[i])
i += 1
end# randomly associate movies with users, where no user has the same movie more than once
100.times do
user = User.all[rand(0…30)]
movie = Movie.all[rand(0…15)]
if user.movies.include?(movie)
next
else
user.movies << movie
end
end
And that’s it! Let’s run Rails server and check the result!
We have successfully built a recommendation engine! Awesome!
Reference: