Scaling programmatic campaign reports efficiently in MiQ’s Intelligence Hub

Chandan Prakash
MiQ Tech and Analytics
5 min readMay 16, 2023

Chandan Prakash, Senior software engineer, MiQ

Background

Intelligence Hub, MiQ’s proprietary platform for pre- and post-campaign insights, allows advertisers to receive campaign reports based on pre-configured metrics and dimensions on a schedule of their choosing.

These reports give marketers an almost real-time view of how their programmatic campaigns run by MiQ are performing. As MiQ expands into new markets, additional metrics and dimensions are added. This has increased the popularity of the Reporting dashboards amongst our global client base.

Figure 1: HUB Reporting scheduler older design flow diagram

Challenge

As a result of this increase in adoption, we saw many reports were being scheduled for the same time slot. This caused an increased load in systems during certain time windows as users were simultaneously accessing the Reporting dashboard in Intelligence Hub to view campaign data.

On the backend, scheduled reports are created by running SQL queries to fetch aggregated campaign data and writing these to csv files.

We wanted to ensure users were able to receive their reports on schedule, whilst enjoying a smooth experience of using the Reporting dashboard. The challenge was to achieve this without increasing our cluster size or thread resources for better SLAs at an optimal cost efficiency.

The scheduler service isn’t bound by any defined SLA — reports can be delivered asynchronously within a 30 minute window. So, we set about trying to optimize our solution to handle the load within the available infrastructure and without any additional costs.

Read on to learn how we did it.

Solution

1. To start, we created two separate deployments for handling scheduled requests and generating UI reports. Although, we could have made the scheduling service a separate micro-service, it would have resulted in more than 90% code duplication. This is because the same service classes are used for query builder, mail, and download features.

Instead, we used Spring Boot’s profiling feature to instantiate only the required services for scheduled report generation in the scheduler deployment. We then created a new API gateway mapping for the scheduler deployment and used it to receive HTTP API calls from the event scheduler. This helped us to separate the load of scheduled reports from the actual reporting dashboard load on Intelligence Hub.

2. Even though we had a separate deployment for the scheduler, the queries were still being executed on the same Databricks SQL cluster as the dashboard queries. This could overload our SQL cluster, resulting in automatic scaling and a potential violation of the SLA for the HUB reporting dashboard. It would also increase our costs.

We solved this issue by adding a distributed rate limiter using the Reddisson client. This limits the number of queries hitting the Databricks SQL cluster to 10 per minute across scheduler requests.

3. As adoption continued to grow, we experienced a high volume of schedules occurring simultaneously, leading to a shortage of threads in the async thread pool. This caused the scheduling service to fail and prevented several reports from being delivered.

To address the surge in schedule requests, we switched from HTTP configuration to Messaging configuration in the event scheduler for better control over async calls and a retry option.

Messaging is an in-house service to MiQ, that provides basic pub-sub functionality over a set of user-defined topics. It is based on grpc(transfer protocol) and Kafka(streaming database).

By using the maxUncommittedEvents configuration, we can limit the burst of traffic occurring at any given time. For example, if maxUncommittedEvents is set to 10, the scheduler pods will handle a maximum of 10 events/schedules per minute(configurable). Any additional schedules will remain in the queue until one of the running schedules is completed.

If any schedules are not processed due to resource or service unavailability, they will be retried based on the maxRedeliveryAttempts configuration. For instance, if maxRedeliveryAttempts is set to 3, the event will be redelivered three more times until it is committed.

This approach simplifies the code by removing retries at different service levels, as messaging configuration takes care of retrying.

In summary, we achieved scaling, rate limiting, and maximum resource utilization without incurring any additional costs to the service.

Figure 2: HUB Reporting scheduler new design flow diagram

Our Migration Strategy

To reconfigure our 4000+ schedules with the messaging configuration with minimal downtime, we kept the existing API for backward compatibility. This allows existing schedules to continue functioning via HTTP calls. However, to have all current schedules in a single configuration, we migrated them to messaging configuration.

We deployed messaging config changes to create a new scheduler with messaging config, and updated existing schedules via API to use messaging. We migrated schedules in batches and monitored for any errors post-migration.

To acquire the configuration for a given task ID, we first pulled all event scheduler task IDs from our database. We then planned the migration in batches using the scheduler API (xxxxxxxxxxxx/schedule-task/scheduleTaskId).

We migrated the task ID configuration from HTTP to messaging, and updated it on the event scheduler via the API (xxxxxxxxxxxx/schedule-task/scheduleTaskId). This enabled us to successfully migrate all schedules without downtime.

Figure 3: Migration of schedules to messaging config with backward compatibility

What did we achieve?

By implementing a stable and scalable solution for our scheduling service, we reduced resource usage and code complexity without any additional costs. We migrated to this solution without any downtime or schedule failures, ultimately improving the user experience for our clients and ensuring they have access to campaign reports when they need.

Looking to join a dynamic and innovative team that’s revolutionizing the world of digital advertising? Consider joining MiQ! We’re always on the lookout for talented individuals who are passionate about using data-driven insights to drive business results. If you’re interested in learning more about career opportunities at MiQ, visit our website.

Chandan is a senior software developer for MiQ, working from the Bangalore office. Super adventurous, he loves challenging himself through trekking, sports, and bike rides — he’s recently ridden from Kashmir to Manali!

--

--