CineBee’s Graph Foundation for Recommendation, Prediction and Tribe connections.

Salil Agarwal
Oct 6, 2018 · 4 min read

The k-nearest neighbours (k-NN) algorithm is among the most popular algorithms in the correlation-based recommendation. In the CineBee platform, distances are calculated between different users, based on their activities that include Movie reviews, actor reviews and affinity and a few more signals. There are lots of distance /similarity metric that are available as a machine learning strategy. A user’s k-NN are the k closest users according to this distance/similarity.

Due to CineBee’s unique, deep and multi-dimensional user review system, any movie or tv review throws well over 10+ signal elements. This signal set forms the seed in helping build user’s taste profile and how they are connected to Tribe. CineBee can achieve this with fewer reviews than any other existing system since each review consists of 10 or more data points around each movie or TV content.

Each review goes through a few machine learning algorithmic steps and some intelligent data filtering to find all optimal data points to understand the user taste and using that to connect them with people of similar taste called Tribe.

Now we will look into distance based similarity algorithms that are generally used in recommendation and prediction systems and how we have adapted it inside the CineBee platform.

Different similarity formula gives different types of result and helps us to implement different functions for users based on their taste and interest. To keep calculation simple and easy to understand we will only consider 1 data point from each user’s review from the CineBee platform.

Cosine Similarity

This is a very common distance formula that can be used to build distance metric. Let’s look at an example that shows how cosine similarity cannot be used to group similar users based on their taste build using ratings.

Cosine Similarity Formula. Image Src: Wikipedia

Let’s say we have 2 users who have reviewed 2 movies each(to keep calculations simple). Here is how users have rated 2 movies: Inception and Matrix

Users rating of Matrix and Inception

Clearly, User1 is not a big fan of both these. From this data it is quite clear that User1 and User2 does has share the same taste. But if you calculate cosine similarity between these 2 users it will be 1 (means perfectly similar) as angle between the two user vectors is 0. See below figure for better understanding:

So cosine similarity is more helpful is finding pattern of user’s behaviour. Using cosine similarity we can build cluster of users with similar watching patterns.

Euclidean Distance

The Euclidean distance between points p and q is the length of the path connecting them.

Euclidean Distance Formula. Image src: Wikipedia

This distance formula is very well suited to calculate similarity based on user’s exact ratings for overall rating and other attribute ratings. We can use this similarity formula to calculate similarity between users based on their reviews and recommend user’s who are very similar to them.

Jaccard Similarity:

If we take just the movies and webTV objects reviewed by an user in a set without any ratings associated with it, we can calculate the Jaccard index, also known as the Jaccard similarity coefficient. The Jaccard coefficient measures similarity between finite sets, and is defined as the size of the intersection divided by the size of the union of the sets:

Jaccard Similarity Formula. Image src: Wikipedia

This concept can be used to discover new movies and webTV objects for users. As this formula does not take into account ratings, we can find users who may be dissimilar in taste but are watching same stuff. So it becomes a good metric for discovering new content outside of user’s taste.

CineBee uses above similarity metrics to build different functions for user. This helps users find new content to watch based on their past taste and also helps them discover new content that they might not see in general. This helps user build their taste over time. It also helps the CineBee platform to evolve over time as user’s taste is evolving.

Storing review data as graphs helps CineBee run the recommendation queries in real time. Calculating different similarity scores also becomes easy across users and as the multi dimensional attributes increases, there is zero effect on the response time. So, using above mentioned algorithms we create a hybrid model to develop a very compelling prediction, recommendation and discovery platform around movies and TV. Recommendation that are just based on binary signals (say Thumbs-Up or Thumbs-down) just don’t have the depth in providing rich user engagement. Recommendations that are based on averaging just creates a strong herding and that is not what the user would enjoy. Cinebee platform solves for these problems by taking in-depth reviews from user and then applying state-of-the-art technology to provide multi-level engagement solutions.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade