Bloom filter in Ads Recommendation

Nupur Bansal
Jan 23 · 7 min read

I work in a team that designs algorithms for recommending ads on Tokopedia. We measure the performance of our algorithms daily. During one such performance measurement exercise, we came across a prominent problem in our system — we noticed that the same ads were being recommended to our buyers repeatedly, despite their bad performance. How did we find this? Well, the count of impressions for ads remained almost the same every day which means we were showing almost the same ads to the users on their every visit.

Source: Pixabay

To understand the problem better, let’s talk about Jenny. Jenny is a budding entrepreneur who also happens to be a shopaholic. Jenny goes to Aura, a famous clothing store. The store had a good collection on display and she tried some of those dresses. After a week, she felt like shopping again and found the same collection. She tried fewer dresses this time. Not satisfied with her last visit but still optimistic 🤞🏼 to get her hands on something new, she went to her favorite store again after a month. To her disappointment, she found the same collection. Obviously, she tried nothing and returned empty-handed. The designs at the store have become monotonous and not even the shopaholics were trying those, let alone purchasing anything.

Source: Giphy

If we apply the same analogy to ads, going to see the collection is equivalent to the viewing of the ads (or impression of ads) on the website. Trying the dresses is equivalent to clicking on those ads (or Click Through Rate — CTR), and purchasing those dresses, is the same as buying the product whose ad was clicked (or Conversion Rate — CVR). Since we were showing the same ads, those were getting the views but very few clicks. This led to a drop in the CTR of our ads.

CTR is one of the most important and determining factors for calibrating the performance of any ads recommendation system. To tackle this situation, we must find a method to refresh the ads shown, periodically. And that might lead to better chances of ads being clicked and hence a better CTR. Why? Because newer or different ads every time may have a much better chance of getting those clicks.

If Aura displays a fresh and unique collection each time Jenny visits that store, do you think she (being a true shopaholic!) would be able to stop herself from trying all those new dresses?

That’s what our system needed. A new range of ads displayed for our users each time they visit Tokopedia. So that they are unable to stop themselves from clicking those tempting ad recommendations. We wanted to maintain the impression and click counts of all those ads in a way that ads with a very high impression count but very low CTR, have lower priority while displaying ads next time.

Now the problem at hand was how could we verify if an ad had crossed its display threshold and should not be displayed the next time rather should be replaced with a new one. This would give the user, fresh ad recommendation. We needed a way that would tell us about the presence of an ad because it would have to analyze real-time traffic and give the response accordingly. And then — Bloom Filter comes to the rescue.

What is a Bloom Filter?

  • It is a prob­abi­listic data structure that can be used to test if an element is part of a set or not.

How is a Bloom Filter designed?

  • A Bloom filter is a bit array of m bits.

Add() — Adding an element to a Bloom

  • Each of the k different hash functions is executed on the element to get k bit positions.

IsExists() — Checking if an element is in the Bloom

  • Each of the k different hash functions is executed on the element to get k bit positions.

Example

We have a Bloom filter of size 15 bits.

Empty Bloom Filter (size=15)

We add 3 elements to it:
- A: Bits — 0,3,8
- B: Bits — 1,4,13
- C: Bits — 6,8,11

Bloom Filter after adding A, B, C

We want to check if some elements are present in bloom or not:
- C: Bits — 6,8,11
- D: Bits — 1,2,5
- E: Bits — 4,8,13

Corresponding bits for C, D, E in Bloom Filter

From the above example, we can see —
- C has all bits marked 1, so it may be present in the bloom (True Positive).
- D has 2 bits marked 0, so it is definitely not present in the bloom (True Negative).
- E has all bits marked 1, so it may be present in the bloom (False Positive).

How did we Implement Bloom filter for ads?

First, we decided on the parameters which would help us decide if the ads need to be filtered out. Something like —

Ads which exceed ‘x’ number of impressions but have CTR lower than ‘y’ %

Source: Giphy

Next, we implemented a simple method to insert the ads in the bloom filter. Our bloom filter is created from a hash map stored in Redis which is read periodically into memory. This table consists of all ad ids which meet the above set condition.

  • Every time an ad has an impression, its impression count is incremented.

The Redis hash map has an expiry set for 24hrs. So after 24hrs, the table gets deleted and all values are reset. The count for impressions and clicks starts from 0 again. So this gives the ads a duration of 24hrs where they are not shown if they aren’t performing well. After this period, they can again start getting impressions.

Creating the Bloom filter

At fixed periods, the Redis hash table is read into the code. All the ad ids present inside the table are fed to the k hash functions and inserted into the Bloom filter array by using the Add() function.

Filtering out ads

When we get a request for ads, we fetch ads from our database. For these ads, we check if an ad is present inside the bloom filter by using the IsExists() function. If the result is true, then the ad is removed from the response. Else the ad remains in the response.

Conclusion

Using a bloom filter has helped us provide a better variety of responses to a buyer. It helps us shuffle our ads and improve penetration into the inventory. It led to an increase in our CTR. Our impressions and clicks increased by 30%. We now get up to 140 Million impressions per day.

Source: gif-finder.com

The only con of using a bloom filter is that we may get some false positives. But these can be very much reduced by setting a good combination of the size of the bloom filter array and its k hash functions. Blooms are effective if some false positives do not negatively impact the system. Another wide application in the real-world for bloom filters is by Medium which uses it to determine if a user has read an article or not.

Having this implementation in place, I am sure if Jenny visited Tokopedia, she would have been a very happy and satisfied customer — thanks to our bloom-filter backed recommendation system!

Source: Pixabay

References

  1. Probabilistic Data structures: Bloom filter

Tokopedia Engineering

Story from people who build Tokopedia

Nupur Bansal

Written by

Golang Developer at Tokopedia. Tech Enthusiast. Mail me at nupur8121991@gmail.com

Tokopedia Engineering

Story from people who build Tokopedia

More From Medium

More from Tokopedia Engineering

More from Tokopedia Engineering

More from Tokopedia Engineering

Deciphering the Art of Estimation

More from Tokopedia Engineering

More from Tokopedia Engineering

Backpressure in Reactive Streams

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade