How we Created Dynamic Custom Audiences on ShareChat to be Used Across Campaigns for Advertising

Tech @ ShareChat
ShareChat TechByte
Published in
8 min readJan 13, 2022

Written by Mohit Dubey

ShareChat is on a rapid growth path with 180 Mn monthly active users. Advertisers want to run their advertisement campaigns on ShareChat in order to reach these target audiences. Our aim is to show the advertisement to the most relevant audience so that the advertisers goals are met.

Problem Statement:

To create segments on a dynamic set of users based on a set of qualifications preferably a query. These segments would then need to be refreshed according to the recency requirements. These segments then can be used across campaigns as the target audience which would yield a higher conversion rate for the advertiser.

A segment for example is : A set of users who clicked on a music campaign, needs to be targeted in another music campaign with a more direct or down-the-funnel conversion communication. In this case, the segment would be created of the users who click on the music campaigns and they need to be updated daily.

Another example would be to create a segment for fitness enthusiasts. Now any campaign with a target audience related to fitness we can use this segment as their target.

Some constraint on the system

  1. The processing should be done at least once per day as in one day the size of the audience can change.
  2. The size of the audience can be anything starting from 30 million and can be upto 200 million.
  3. The processing should not take more than an hour.

Solution:

To create a solution for the above problem statement we needed to build a parallel processing system that can take input from some source, in this case, tables from BigQuery and then create/update custom dynamic segments. These segments can then be used as an audience for the campaign.

We have divided the solutions into 4 parts:

  1. Creating a custom audience data source.
  2. Creating a mapping between the custom audience data source and audience name.
  3. Processing the data from the data source parallely.
  4. Using the custom audience for targeting.

Creating a custom audience data source :

  • We have all the user interaction data stored in BigQuery so to get the audiences based on some query we created Scheduled queries.
  • These scheduled queries will run on the given time, run the given query and store the result of the query in the given destination table.
  • This destination table will now serve as the source for the data for that given custom audience.
  • For simplicity we have 1:1 relation with the custom audience and BigQuery table. So one table will only hold data for one custom audience.

Creating a mapping between the custom audience data source and audience name:

  • In the previous step we have created tables which will store the data for custom audiences based on the queries created on a recurring basis.
  • Now we store these table names against the name of the segments.
  • These names will be shown on the UI with the size of the audience to be selected for targeting.
  • We store these details in MongoDB where one record will store the following data.
  • The processingState field will be used to identify whether the audience is under processing, halted or completed.

Processing the data from the data source in parallel:

  • As there can be multiple custom audiences at any given time we have to make a system which can process all these at once parallely.
  • We are using kubernetes for managing containerized workloads and services at ShareChat so we devised a solution using kubernetes toolset.
  • As the requirement was to do the processing once per day this became a great fit for the CronJob feature given by kubernetes.
  • A CronJob object is like one line of a crontab (cron table) file. It runs a job periodically on a given schedule, written in Cron format.
  • CronJob also supports running multiple pods at once which can be used to achieve the parallel processing of custom audiences.
  • We can define the point in time within that interval when the job should start. The pod will spin up at the given time, do its job and then be cleaned from the system. This way the cost will only be incurred for the time pod was running.
  • A Cronjob template looks like this:
  • We can use the parallelism property in the yaml file to define how many pods the pod controller will create when the cron job runs.
  • Here as we have defined parallelism as 3 and completions also as 3 the pod controller will spin up 3 pods and wait for 3 completions. If any of the pod gets evicted the pod controller will spin up another replacement pod till the time completion count matches 3.
  • To keep the system simple we have kept a 1:1 relation between the pod and the custom audience meaning one pod will only process data for one custom audience. This is done to handle edge cases gracefully and make debugging of the system easier.

Flow of custom audience processing:

Before going into how we process all the data parallel we need to get an understanding of how we are reading from MongoDb and BigQuery. As these play a big part in the processing engine.

Reading From MongoDB

  • The mapping of custom audiences to tableName is stored in MongoDB.
  • In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
  • For us one document stores the data for one custom audience so if we write to one document we are sure that only one writer is writing at any time.
  • We use findOneAndUpdate to get and update the document atomically in MongoDB.
  • As the findOneAndUpdate is atomic each pod will receive a different custom audience to be worked on.

Reading From BigQuery(BQ)

  • We have to read data stored for the custom audience from BigQuery(BQ) and the data size can be huge. So instead of running interactive queries (which are costly) we get the data from BQ using Batch queries.
  • The config for our batch queries is:
  • The maxResults property limits the number of results per page.
  • As we want to read all the data from the table our query will look like:
  • Running this query as a batch will return a job object which will contain a jobId and a pageToken. The jobId is to uniquely identify a job and the pageToken is used for pagination.
  • Once a job is created we store the jobId and the initial pageToken in memcache against a key formed by custom audience name + tableName.
  • This will be used to know if there were any jobs which were created but not completely processed. As the pod can go down anytime if we don’t store this information we have to create a new job again when the new pod comes up.
  • The TTL for this key would be set to 12 hours so that this is not available after 24 hours.

Processing Flow:

  • In each pod try to get the custom audience using findOneAndUpdate and as it is atomic each pod will receive a different custom audience to be worked on.
  • Once we have the custom audience Id, name and the tableName we need to now get the data from BQ.
  • We first check in memcache using the custom audience name + tableName as key to get any job which is present.
  • If we get the job we get the jobId and the pageToken we hit BQ to get the job information. We set the pageToken to the pageToken we have in memcache. This way we will only read values which were not processed.
  • If we don’t get the job we create a job by hitting BQ.
  • Once we have the pageToken and job information we start getting data from BQ.
  • We have the custom audience Id and the data we process data in batches and write into BigTable in batches of 10000. This is done to save on the network round trip time for each write.
  • After each chunk processing we update the value of the new pageToken in memcache. This is done to make sure we start reading from the same place where we left off.
  • This way all the custom audiences will be processed in parallel and be used in targeting.

Using the custom audience for targeting:

  • Now the data for each custom audience is stored in BigTable with the row key as the hash of the userId.
  • In each targeting call we get the data from BigTable for the userid in the request and then check against if the custom audience id matches the id given in the campaign targeting.
  • If it does the campaign is considered for selection and if not it is dropped from selection.

Handling Error Cases

Pod Terminating

As we all know we cannot rely on the state of the pod in kubernetes so we have to implement a mechanism to cope with this.

  • If the pod is terminating for some reason it receives a SIGTERM signal. In our code we have put a listener on the signal which upon receiving the signal will update the processingState of the audience in MongoDB accordingly.

Issue in BQ Job

While creating a job in BQ there can be cases where it can timeout and the job never gets created. For this, we have a retry mechanism that will try at least 3 times before generating an alert for the job failure.

Future Enhancements

  • Right now when running parallel distributed Jobs, we had to set a separate system to partition the work among the workers.
  • The Kubernetes 1.21 release introduces a new field to control Job completion mode, a configuration option that allows you to control how Pod completions affect the overall progress of a Job, with two possible options:
    => NonIndexed (default): the Job is considered complete when there has been a number of successfully completed Pods equal to the specified number in .spec.completions. In other words, each Pod completion is homologous to each other. Any Job you might have created before the introduction of completion modes is implicitly NonIndexed.
    => Indexed: the Job is considered complete when there is one successfully completed Pod associated with each index from 0 to .spec.completions-1. The index is exposed to each Pod in the batch.kubernetes.io/job-completion-index annotation and the JOB_COMPLETION_INDEX environment variable.
  • We can start using Jobs with Indexed completion mode, or Indexed Jobs, for short, to easily start parallel Jobs.
  • Then, each worker Pod can have a statically assigned partition of the data based on the index.
  • This will save us from having to set up a queuing system or even having to modify your binary!

We’ll be writing more blogs in the near future focusing on other aspects of building a large-scale advertising platform. Stay tuned!

Cover illustration by Ritesh Waingankar

--

--