How we built a reversible recommendation system using ElasticSearch

Pierre Cavalet
Jul 15, 2019 · 5 min read
Image for post
Image for post

For a french version of this article, click here.

Recommendation engines are everywhere. There are a lot of complicated techniques to recommend content, but sometimes, what you need is just a recommendation system to propose specific content to your users using simple rules.

A bit of context

We are building content related features where we need to be able to explain everything that is displayed to a specific user. For exemple, if you go to your home page, and an article called My thoughts on the last season of Game of Thrones is displayed, we need to be able to explain why.

When your recommendation algorithm is basically ordering content based on publication date, this is pretty simple to do. But when it uses user information and stuff like this, that’s when it becomes harder to explain results.

How it works

The algorithm is pretty simple. We define a list of criteria that a content could match, and then we order them by priority. Each criterion must be a boolean.

Let’s say we are recommending articles. In order of priority:

  1. the user has not read the article
  2. at least an article tag matches the user tags
  3. the article is flagged important

Note: This is just an example, and you can define a way bigger list, as long as you order them. The order is purely arbitrary for example purposes.

Now we want to assign a score to each criterion. The scores we will use are powers of 2. We will start assigning from bottom to top.

Let’s consider a user called John.

John has not read My thoughts on the last season of Game of Thrones, but the article tags do not match any of John’s tags. Finally, the article is flagged important by the content team.

Our algorithm will sum the score of each criterion that matches to give a final score to the article for the calculation of John’s home page: 8+ 2= 10.

Repeating this process for each article, we can rank articles based on their scores. The highest articles will be displayed on John’s home page.

If you are wondering why we use powers of 2, we will come to that later in the article.

ElasticSearch implementation

Let’s say I have the following documents in my index:

ElasticSearch gives us a wonderful tool called function score query. It allows us to modify the score of the documents retrieved by a query, based on filter queries.

For example, I can do a first query that will retrieve all the articles, and then use a filter query on those results to change the score of the articles that John has not read. Let’s say that John has read the article with the ID articleId2.

Note: If your query filters documents instead of a match_all, you can use constant_score so that your query score does not alter the final score.

Now we can just add the other functions for the tags and the important flag and we are all set !

Here is the result returned by ElasticSearch:

As you can see, ElasticSearch returns the score of each documents in its query, which is perfect for what is next: Reversing ElasticSearch score. So let’s answer this question:

Why the hell are we using powers of 2 ?

We did not choose to use powers of 2 just because they are cool (but they are). We chose them because they are ideal to store boolean information. 0 stands for false, 1 stands for true. Storing information with powers of 2 is called a bit field (or a bit flag).

Additionally, this technique will produce a one-to-one relation between the input (boolean information) and the output (flag value). It means that we are able to re-create the input data from the output, which is also known as reversible computing.

Note: We are skipping the first byte, because the default score when ElasticSearch matches nothing is 1, so the first power of two: 2⁰ = 1 is already taken.

This is how it works for our article (the red one is skipped):

Image for post
Image for post

So … if I convert my score (10) to its binary value (1010), then I can see which boolean information were set to true ?

Yes. But there is a better trick. You can use bitwise operator to extract the information. More specifically the bitwise-AND. What it does is basically what its name means. It applies a AND operation between every bits of two given number. For example 7 & 6 = 6:

Image for post
Image for post

Now if we only use one specific bit with the bitwise-AND, we can determine if the flag contains this information because all the other bits will evaluate to 0.

Image for post
Image for post

Here we only extract one single information. Because 10 & 2 ≠ 0, the article is tagged important.

Image for post
Image for post

Because 10 & 4 = 0, the user tags does not match the article tags.

Using this technique we can easily write a script that would decode a given score, which is exactly what we wanted !

This is a small example of how we could decode the score in javascript, but it is easily reproductible in a different language:

Recap

Kaliop

A series of articles made by techs for techs.

Thanks to Clement “Snowji” D and Samuel Bouic

Pierre Cavalet

Written by

A web developer passionate about education. Technical Expert @ Kaliop.

Kaliop

Kaliop

A series of articles made by techs for techs. Writers share their experience and best practices for a more beautiful digital world.

Pierre Cavalet

Written by

A web developer passionate about education. Technical Expert @ Kaliop.

Kaliop

Kaliop

A series of articles made by techs for techs. Writers share their experience and best practices for a more beautiful digital world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store