How we brought down storage costs for the Ads platform: Part 1

Pulkit Chaudhary
Oct 28 · 5 min read
Image for post
Image for post

At ShareChat, our Ads platform allows advertisers to run all their marketing campaigns on our app and at scale. A common problem that occurs has been around the per-user frequency capping for an ad.

To achieve this, our code needs to do the comparison of the number of times a user has already seen an ad to the configured frequency in real-time and for that, we need storage of how many times a user has seen an ad. On the face of it, this problem looks simple enough but when we are solving at the scale of 160 million MAU, it gets trickier.

We could always throw more servers and storage at any solution to scale it but innovation happens when you try to be frugal.

We will be exploring the various solutions to this problem, the mistakes we did, and the final solution we implemented in this 2 part series:

  1. Part 1: Going through the different solutions, along with the pros and cons of each, we considered to solve this problem.
  2. Part 2: The final solution along with the implementation.
  1. Low Latency is of prime importance because we need very fast reads.
  2. A minimum of 500kops should be supported by the system.
  3. A minimum of 10k active ads at any given time.
  4. Close to 100% accuracy required but a complete 100% accuracy is not essential.
  5. Storage costs should be as minimum as possible.
Image for post
Image for post

Pros:

  1. Good enough for a small use-case when you are just starting up and have DB drivers already existing.

Cons:

  1. Cannot scale due to the higher latencies involved with DB operations.
  2. Storing a huge amount of data involves huge storage costs too.
Image for post
Image for post

Pros:

  1. Improves latency from Solution 1 as the data is now in cache storage.
  2. Still good enough for only small use-cases.

Cons:

  1. Storing a huge amount of data involves huge storage costs too.
  2. A single key takes 70 bytes. Even if 10 million users see a single ad, that will mean using up 700 MB of storage just for a single ad. We have to support more than 10k active ads.
Image for post
Image for post

Pros:

  1. Improves upon the storage usage from Solution 2 as Redis hashmaps use lesser storage than key-values: https://docs.redislabs.com/latest/ri/memory-optimizations/#convert-hashtable-to-ziplist-for-hashes
  2. Redis hashmaps have a constraint that after a specific number of items, the hashmaps are internally stored as simple key-values and lose the compression that the hashmaps have. As such, the storage costs improvement will only be there for a smaller scale as the number of active ads are dynamic and keep increasing and decreasing.

Cons:

  1. There is a possibility of huge storage costs in this solution too.
  2. Doesn’t actually improve much on Solution 2.

Pros:

  1. All the requirements of the spec are satisfied by this solution because the hashmap is passed to the backend by the client in each request. This results in low latency along with low storage costs and 100% accuracy.

Cons:

  1. Results in heavy client-side operations along with maintaining data in the app.
  2. Passing the entire payload on each request to the backend for ads and this payload could grow large based on the number of ads a user sees in a day. Also, this payload might need to be passed in multiple microservices on the backend.
Image for post
Image for post

Pros:

  1. All the requirements of the spec are satisfied by this solution with very minimum storage costs.

Cons:

  1. Operational overhead of removing deactivated campaigns to reclaim space.
  2. Works well for the specifications and the problem statement we have in Part 1 where we need frequency capping for the lifetime of the ad for a user. Once we need the frequency capping of the ad at the daily and hourly levels too, this solution also starts using up a huge amount of storage.

Finishing up

We just went through different solutions that we could use for the problem we have on our hands. All of these are valid solutions that could be used for different levels of scale. For example, solution 1 is the best when the scale is small and you are just starting up.

We’ll continue in part 2 where we explore a modified problem statement in which we want this frequency capping at an overall, daily, and hourly level for each ad. We will also explore the final implementation we went ahead with.

Any guesses as to what implementation we could have used? I’ll give you a hint - we used a probabilistic data structure.

Sharechat

India's leading regional social media

Thanks to Venkatesh Ramaswamy

Pulkit Chaudhary

Written by

Hacking @sharechat | Previously Worked @wyzebulb and @babajob | Developer | Love to ask and answer questions.

Sharechat

Sharechat

India's leading regional social media

Pulkit Chaudhary

Written by

Hacking @sharechat | Previously Worked @wyzebulb and @babajob | Developer | Love to ask and answer questions.

Sharechat

Sharechat

India's leading regional social media

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store