Load Testing @ Applift - part 1

Mangat Rai Modi
Engineering @ Applift
5 min readJan 24, 2019

Why Applift uses Locust to load test.

Applift as a mobile ad company deals with events at high scale. The events could be clicks coming from the advertisement, or installs or any other activity on the advertised app. It is very common for Applift to see 100s of millions of events per day. Very often we have to respond to these events in milliseconds. We need the shortest response time possible for two main reasons:

  • A User who clicks on the ad would like to see the landing page quickly. Any delay at our side and the user may close the browser tab out of frustration. This leads to lost revenue for Applift.
  • Advertiser campaigns are paused when they reach their budget. The budget calculation has to be fast to reduce the amount of over delivery

There are also several other use cases which are more complex and involve multiple moving parts.

Architecture @ Applift

Following is a simplified architecture for event processing in Applift. Data flows into Applift’s system via users interacting with ads or third-party systems into any of our web services. This data is put into a corresponding data pipeline. Various internal services react to these events and consume/produce more data. Data also flows out of Applift to various third parties.

Figure1: Architecture at Applift

Most of the services in Applift are real-time as per the demand of the business and can be scaled to any number of instances. These services interact together using data pipelines or HTTP APIs. This data ingress-egress is known as traffic, which has the following characteristics:

  • Traffic can increase several folds depending on several factors. So the system should scale accordingly.
  • Traffic across some data routes is much more concentrated than others. So every service has a different throughput and latency requirement.

Business requirements and the traffic scenarios made it crucial to have a load testing framework where we can keep the system constantly under load and can change the load on demand. At this point, we are ready to draft a requirement for the load testing system:

  1. We want to load test not only the web services but the whole cluster.
  2. We want to increase and decrease the load on the service on demand.
  3. We want detailed, powerful and descriptive reports to understand the performance results

Locust

We evaluated Jmeter, Gatling, and locust for the purpose of the load testing. All of them were have strong pros and cons. Comparison of all the popular load testing tools can be found here. I will write another article on my own findings, but for now, we found locust to be the best fit on the above minimum requirements.

How does locust work?

Locust is essentially a load testing framework written in Python where you have a master and multiple slave instances. On the master instance, you start a test by specifying the number of users and hatch rate (rate of users connecting per second). The master distributes the number of users and hatch rate across all the connected slaves.

Figure2: locust master-slave

Note : 100 users connecting with a hatch-rate of 10 per second means, locust will execute a maximum of 100 locust tasks in parallel, while increasing the parallelism by 10 tasks per second

A locust task is a minimum unit of work. A task is stateless. Some examples of tasks are:

  • A client request to an HTTP API
  • Querying a database
  • Any other operation which can be run in parallel to itself and other tasks.

In Figure 2, slaves are capable of running different types of tasks. Each task has a name and a weight. The locust master manages execution at slaves in such a manner that higher weighted the task runs more frequently than the lower ones.

Note: The run ratio of the tasks is not perfectly at sync with the weight ratio as there are multiple factors at play here.

Slaves report the metrics back to the locust master which keeps aggregating this data. The data from the locust master can be queried in the following forms:

  • By reading CSV file created by the master on the disk
  • By querying an HTTP API
  • By reading Graphs and tables, shown on the master web portal (figure-3)
  • or by sending it to any metrics system like InfluxDB, Prometheus from each slave
Figure3: A sample table on Locust Master

Locust for Applift

Now that we have a rough idea of how locust works, we will evaluate locust for the minimum requirements we defined earlier.

Spec 1: We want to load test not only the web services but the whole cluster: Locust is capable of testing any arbitrary protocol and locust can trigger multiple types of tasks at any scale. This enabled us to target each service in the cluster to mimic actual production behavior.

Spec2: We want to increase and decrease the load on a service: Multiple slaves can be started with different weights assigned to a task to control the load. It is easy to upscale and downscale the number of slaves.

Spec3: We want detailed, powerful and descriptive means to understand the performance results: The default reports from locust are quite brief, but locust slaves can be changed to calculate and write metrics to any system. We could write them in Prometheus and build a powerful dashboard using Grafana.

So locust does fit well into the requirements but there were several other additional benefits that affected the decision:

  1. It essentially is unopinionated. It doesn't force the author to use a given technology. We just need to implement the protocol.
  2. Locust slaves can be written in any language by using the APIs provided, although there are some really amazing 3rd party ports for Go and Java.
  3. Locust slaves, being distributed can easily be scaled on our existing Kubernetes infrastructure.

In this post, we briefly discussed load testing requirements at Applift. In the next post, I will go into detail on how to write slaves for locust in golang. I will also share some code on how we deploy our load testing service at Kubernetes cluster and how it can be extended to accommodate more tasks

--

--

Mangat Rai Modi
Engineering @ Applift

mangatmodi.com | High scalable services, Low latency coding, and Concurrency! Get’s excited by any tough problem.