Load Testing

A Little Background

FanDuel
FanDuel
May 20, 2016 · 8 min read
from sploid.gizmodo.com

Acquiring Target

  • information collected during FanDuel’s marketing campaign in the lead up to NFL

Tools — Test Runner

Many of the internal systems under testing operate some form of request / response cycle. Some are externally-facing web systems (several different systems drive various parts of FanDuel’s main website), others are internal services for things like user authorisation, payments, entering contests and so on. We did some experiments with Locust, Tsung and JMeter, Locust seemed the best fit for a number of reasons:

  1. Since Locust is implemented in Python, we are able to plug our pre-existing internal service interface packages (which also happen to be written in Python) straight into it, making it very easy to extend Locust to load test a number of non-HTTP systems (see Locust’s docs for an example of how to do this).
  2. Locust provides a decent set of events that makes hooking our own monitoring into it quite straightforward.
  3. Locust supports a master / slave setup out of the box, which means it’s easy (with enough slaves) for it to produce the levels of load we need; which is well above 100k requests/sec in some cases.

Tools — Monitoring

  • The response times of any other system with which it communicates

Example

Here’s an example of some of the data captured during a system test of one of our core services. We configure Locust to ramp up to 40,000 users although the test is manually stopped around 20,000. Each Locust user is configured to log in and then make 10 authorisation requests / second. There is a small chance of logging out and then logging back in as a different user after each request.

Next steps

The main goal of the project is to identify the bottlenecks that could prevent us from reaching target capacity. In some cases these can be mitigated by adding servers to a pool, or to increase the upper bound of auto-scaling groups. In other cases the software and systems need some re-work to remove the pinch point, which is often a single node. The (horizontal) scaling of systems is a significant topic in its own right, so I’ll leave that as the subject of a future post. The project is also providing a significant secondary benefit in the monitoring capability of the platform. Each time a bottleneck is found, we have monitoring in place to detect it. The ideal scenario is for the metric to show some form of early warning signs, which can be fed into the existing automated alerting tools.

FanDuel Life

Follow what’s happening behind the scenes at FanDuel, from the people who work there.

FanDuel

Written by

FanDuel

FanDuel Life

Follow what’s happening behind the scenes at FanDuel, from the people who work there.