Recommendation system at Jellysmack: from business concept to evaluation.

jeremie bureau

Published in

Jellysmacklabs

6 min readMar 31, 2023

Olga Kurnosova

Introduction

Jellysmack — the global creator company started its story in 2016. Jellysmack has since developed advanced AI technology that analyzes Youtube libraries and individual performances of creators and videos.

Since its creation in 2016, Jellysmack has achieved exponential growth and quickly established itself as the global creator company that detects and develops the world’s most talented video creators. In order to do so, Jellysmack has an entire sourcing team dedicated to the discovery of potential creators, where every sourcer should bring at least 10 creators per month. In order to help our sourcers to do so, the Jellysmack AI team has decided to develop a recommendation engine of new potential creators. In the following article we’ll analyze the main steps of the construction of an AI model for the recommendation system at Jellysmack, from business insights to evaluation.

After its development recommendation engine should be integrated in the user interface of the internal Jellysmack platform, Jellypulse, used by sourcers for data analysis of millions of Youtube data sets/points. Therefore in this article we’ll call our sourcers — Jellypulse users or just users.

Initial business insights and requirements for future recommendation engine.

For the creation of a recommendation engine only available, public Youtube data was used.

First intuition for the recommendation engine: we should analyze which kind of channels every user is looking for and to propose their best creator analogs. Meaning, for a user who is looking for the Beauty category of creators producing makeup videos we should propose not only creators specialized in beauty, but creators with relevant videos. Moreover, we want our future recommendation engine to be scalable to all YouTube channels in different languages and an acceptable number of followers/lib size for Jellysmack. Namely, if one user found an over-performing French speaking channel on wood crafting, we want to see similar channels all over Youtube whatever the language is.

Motivation for choosing content based approach

As you possibly already know there are two global approaches to recommendation systems: collaborative filtering and content based. [1] In collaborative filtering, we analyze user behavior to predict if one user will appreciate channels liked by another user, but what if we want to propose an unknown channel? At Jellysmack, we have a limited number of sourcers (around 20); they can only view a limited number of channels (20 each per day) that is very little compared to the universe of YouTube creators, hence collaborative filtering is not the preferred method. In addition, at Jellysmack we don’t log users data, so including users data in the database would take considerable time.

The solution is to replace collaborative filtering by a content based solution similar to a search. As soon as we find the list of similar channels for our sourced channel, we can propose it to our user.

Model

We want our solution to be scaled to all Youtube channels, all models were tested on a dataset containing all Youtube creators with >10000 followers, that is a bit more than 500 000 creators.

Our input dataset contains channel statistics data, such as number of followers and library size of the creator and the following list of video textual information of each creator: Titles and tags. All of this data was accessible via the Youtube public data API [2]. Consequently, in our algorithm we will be using natural language processing approaches.

How can we assure that the information we take from our NLP data is useful and doesn’t contain useless tags? To preprocess our text data we did a keywords extraction.

The best tool for this is Keybert [3], we tested it on a 1000 videos dataset compared with manual extraction of keywords made by annotators.To make annotations Jellysmack collaborated with Kili Technology [4]. Example : Keywords extracted by annotators and by keybert: Results were pretty satisfactory (cosine similarity 0.6 between manual and keybert extracted dataset).

As soon as we did a feature extraction, we then tested pretrained models for embedding [5]. Model used: ‘all-mpnet-base-v2’ from Sentence transformers library [6][7] and Laser AI facebook research from a Facebook ai lab. Both models were chosen for their multilanguage attributes and high performances in semantic searches. For each channel, we took the mean of embeddings of all videos calculated directly from un-processed tags and titles and from keywords extracted from tags, titles. Now that we have several embedding models for our dataset, we will need to launch a similarity search and evaluate the results.

For efficient similarity searches, the Faiss from Facebook AI team is the perfect solution[9]. We chose Faiss because it’s an optimized algorithm for similarity search and clustering of dense vectors embeddings[10]. To speed up calculations Faiss split our vectors into Voronoi cells and look for the similar vectors only in N nearest centroid cells.The advantage of Faiss is that it can work with high dimensional data vectors, as in our case (756 dimension for Sbert and 1024 for laser embedding ) and be accelerated by GPU [12].

Now when we have a number of embeddings, we can quickly launch a similarity search of each embedding set of recommended channels based on cosine similarity metrics. The key questions are how to evaluate the performance of our algorithm? And did we manage to find similar channels?

Evaluation

The evaluation of content based recommendation systems can be tricky. To evaluate our models we composed a dataset from 200 channels recently signed by Jellysmack and launched several models on all Youtube channels to give 15 recommendations for each channel. Then we asked our sourcing team to give a grade for each recommendation. They were asked to evaluate a recommendation based only on content, not regarding channel size or number of followers. Grades were from 1 to 5 : 1-not similar at all to 5 — highest similarity score.

When the dataset was completed the most interesting part begins: to run several models through our evaluation set. To compare models between each other we used Mean Cumulative gain[14] of three best recommendations of each model. Below we have the results of the 8 best models:

Here we have the distribution of similarity notes given to the three best recommendations.

Model 1 which used sbert embeddings directly on titles and tags seems to have the best results of our study.

Conclusion

Here we have described how the recommendation system project went from business concept to the modeling and evaluation stages. The trickiest part in the construction of a recommendation system remains its evaluation. Relying on ML metrics to determine algorithm performance is not enough, only real user feedback matters. Therefore, when constructing your recommendation system don’t hesitate to dedicate a bit more time to understanding the final product and business needs, speak with simple users about what they expect from the product and how you can simplify their daily routine. Good luck!

References

[1] https://www.sciencedirect.com/science/article/pii/S1110866515000341

F.O. Isinkaye, Y.O. Folajimi, B.A. Ojokoh, Recommendation systems: Principles, methods and evaluation, Egyptian Informatics Journal, Volume 16, Issue 3, 2015, Pages 261–273,ISSN 1110–8665, https://doi.org/10.1016/j.eij.2015.06.005.

[2] https://youtube-data-api.readthedocs.io/en/latest/

[3] https://github.com/MaartenGr/KeyBERT

[4] https://kili-technology.com/company

[5] https://en.wikipedia.org/wiki/Word_embedding

[6] https://www.sbert.net/

[7] https://www.sbert.net/docs/pretrained_models.html

[8] https://ai.facebook.com/blog/laser-multilingual-sentence-embeddings/

[9] https://ai.facebook.com/tools/faiss/

[10] https://www.pinecone.io/learn/dense-vector-embeddings-nlp/

[11] https://www.lftechnology.com/blog/ai/faiss-advanced-concepts/

[12] https://www.pinecone.io/learn/faiss-tutorial/

[13] https://studymachinelearning.com/cosine-similarity-text-similarity-metric/