Get a Load of This!

Published in

The Casper Tech Blog: Z++

6 min readJun 11, 2019

Load testing a live e-commerce web application

Intro

At some point in your career as a web developer on a production application, you may get a request to perform some load testing on your app. But what is load testing, and why is it useful? Load testing allows you to collect data to learn more about how much traffic an application can sustain, how scalable the it is, and identify potential areas for performance improvements. At Casper, we wanted to know if we could handle 10 times our highest traffic day in a few critical areas of user interaction (such as what we might expect from a flash sale). We did some stress tests to find the upper bound of volume we could handle with our current infrastructure.

Finding the right tool

Once you’ve determined that you need to do a load test on your application, you have to decide how to do it. You can always write something from scratch, but there are also many tools available to use out of the box. Some of these tools may require external dependencies; for example, artillery, the tool we used, requires having Node.js installed. We picked artillery because there has been a historical precedent of using it at Casper, and it didn’t have any dependencies we didn’t already have installed.

The Artillery site

Configuring your test

An artillery test has a couple of different parts to it, but the most important configurations are the ramp up (adding x number of users over time), and the sustained maximum load. Since artillery runs on a node server, setting too high of an arrival rate (i.e. the number of users per second) can cause a high CPU overload. This can make it hard to gauge the actual capacity of simple GET requests, but usually is less of an issue for most POST request interactions, which are more likely to be true bottlenecks for users. When choosing the values for the max load and ramp up arrival rate, it can be useful to look at a monitoring service and see what current traffic a given endpoint receives- you want to test for a bit higher than current traffic, but not exponentially higher right away. You can choose to either test a single endpoint, or set up a flow that mimics a series of interactions that a typical user will follow. Check out more detailed information in the artillery docs.

As with any free software, sometimes some features are broken; we were trying to parse a CSV for some data to pass into requests, but the CSV parsing feature was broken. Luckily, a quick search online led to an easy solution of using an older version of artillery. There were also some issues with capturing information from the response JSON that didn’t work as expected, so we needed to use some javascript helper functions instead. In any case, the software was free, mostly worked, and had enough of a user base to be able to debug any issues easily.

We ran the test itself on a local dev machine, as that was sufficient for the amount of load we were generating. Artillery pro has some built in AWS infrastructure, if that fits your needs better. We could have potentially built an AWS lambda to run the test with more CPU resources, but the only time we got a “High CPU” warning was when we tried to find the upper limit of volume for hitting a simple GET request; as this was not the bottleneck step in our flow, it seemed safe to disregard not having the most accurate stats on that endpoint.

Interpreting results

Once we had our artillery script written out, we were able to run it and get results! But what did they mean??

In the last graph, we can see how many of each HTTP status code we got over time, and correspond that with the mean requests per second (RPS) graph to see how many requests we were able to handle before running into 400s and 500s. In our situation, 502s and 503s meant the server was overloaded. The 401s meant a previous step failed, so subsequent requests in our artillery flow test were not authorized. But why exactly was the server getting overloaded at only 18 RPS?

Digging Deeper

Our initial goal was to confirm that we could handle the traffic expected on a big flash sale day; seeing that we could only handle 18 RPS was a bit disheartening, so we checked our logs and monitoring to see what we could improve. Curiously, we didn’t see any errors that could provide us with stacktraces; we were getting 502s and 503s from nginx, but nothing from our actual unicorn Rails app. What was happening??

We decided to see if anything in AWS could shed a light on what was happening- it turns out, while we had the same auto scaling policy for both staging and production, the maximum instance limit for our autoscaling group on staging was set to one instance. With our one instance unable to handle the requests from both load testing and health checks, AWS was simply terminating it and spinning up a new one, but not actually scaling up as we would in production. When our health checks failed, our edge server (Fastly) would return a 503.

Once we found that hole, we updated the autoscaling group size on staging to match production, and reran our load test. However, we still got 502s, with no logging from unicorn. We started digging into our unicorn configurations, though this was complicated by the fact that our configuration was split across several sources: ansible, environment variables, and in the codebase itself. Eventually, we found that, contrary to the suggested default backlog param of 1024, we had our server set to only handle 64 connections at a time. This meant that we were dropping requests between nginx and unicorn; with nginx not receiving any response from unicorn, it was returning 502s due to the lost connections.

We updated our unicorn backlog param to the suggested default of 1024 (which is also the same default backlog value for nginx), and reran the load test again. This time, we were able to reach much higher loads before getting 500s, along with meaningful stacktraces in DataDog! We were able to identify the bottleneck as an API call to one of our 3rd party services. At this point, we also started running our artillery tests directly on AWS via a serverless setup, so we weren’t limited by local CPU resources.

Side Effects of Load Testing

When we chose to do load testing, we wanted to make sure we didn’t negatively impact performance and uptime on production, so we performed our tests against our staging environment. Staging seemed like a good choice, because it mimics the database instance size and server capacity of production, but doesn’t negatively impact actual business transactions. However, our front-end teams rely on interacting with api endpoints on staging for development, so when staging was unable to handle requests, it negatively impacted other teams.

Takeaways

In conclusion, load testing allows you to get descriptive metrics of how much capacity your site has for the endpoints and flows you care about. It can be useful in trying to understand what your app’s current capacity is and identifying areas of potential performance improvement if your current capacity doesn’t meet your expected needs. Improvements can involve anything from changing server configurations to communicating with 3rd party partners.

Come work with us at Casper Tech — we’re hiring!

Special thanks to Rich Cho for all his help and support in this project.