Don’t get TechCrunched: performance testing for your HTTP APIs
If you’re a big company with a platform that regularly supports tons of users, you have to know how incremental traffic impacts the performance of your systems. Maybe you’re getting ready for a seasonal surge in traffic like Black Friday or Cyber Monday, and don’t want to lose out on serious revenue in case a critical process fails.
Even startups can become a victim of their own success, if they receive an unexpected mention on TechCrunch or HackerNews and their sites collapse under the influx of new visitors.
Whether you’re already supporting lots of traffic, gearing up for a seasonal spike, or become a viral success, don’t get TechCrunched! Load test your APIs with real-world loads to validate their performance before it’s too late.
What is load testing?
On the Postman Engineering blog, we’ve already learned how to get started with test automation. Once you establish a process for functional and integration testing, the next step is to look at performance testing to gain a deeper understanding about the stability, reliability, and availability of your platform.
For example, how does your system behave when a large number of users are accessing your servers? How long does it take for a user to receive a response? As more users access your platform, does a response take significantly longer or does a request timeout? Is this effect magnified at certain times of the day or week?
Load testing falls under a broader umbrella of performance testing. During load testing, you will subject an API to a workload that approaches and possibly exceeds the limits of its specifications.
Sometimes load testing is completed along with stress testing, to determine the system’s ability to handle a heavier than normal workload. Sometimes it can include endurance testing, to determine the system’s ability to handle a specified work load for a prolonged period of time.
Why do load testing?
One unbelievably common approach is to throw your code into production, and be ready to roll it back if it doesn’t scale well. Other developers might rely on server logs to find out when their servers are crashing.
If you’re still reading, you probably want to be proactive and achieve a higher level of confidence about the stability, reliability, and availability of your platform.
Maybe you know exactly when you’ll be featured on Shark Tank, or maybe you’re totally unprepared for any kind of viral traffic. A single appearance on the front page of the internet can result in an unintentional distributed denial of service (DDOS) attack. Your servers might not be able to handle all of these simultaneous requests, and you’ll piss off your users and lose out on revenue.
- Sharon Ann Kean, Bloom & Wild
There’s many ways to optimize and balance the load that hits your servers, validate user inputs, and cache static content. You should definitely do all of that in addition to load testing.
Load testing with Continuous Integration
Load testing can be done as a regular part of your Continuous Integration process. At these early stages of the development lifecycle, you can identify bottlenecks and improve the stability of your application before it is released to production. It’s best to catch issues and errors in these early stages when the cost of debugging is the lowest.
Load testing for exploratory testing
Load testing can also be done on an exploratory basis to validate hypotheses and gain a deeper understanding of your systems. Load testing for exploration is done similarly to regular stage-gate load testing, except on an ad hoc basis with higher than average loads. By increasing your load incrementally, you can determine the maximum throughput of your API that functions within your desired tolerances.
For example, imagine that you increase the number of active users by X% and see no significant degradation in performance. Assuming you have stable usage, you can rest assured that your current infrastructure supports next quarter’s growth which is forecasted more modestly than the X% increase that you tested.
Take another example. Imagine that you run the same test, but instead you see significant degradation of a key performance metric. You can use this data to inform the acceptable bounds of a service-level agreement (SLA) for teams that rely on your services. If your system doesn’t adequately support current or forecasted activity, this data can also drive the discussion for future investments in infrastructure and compute resources.
- Pinpoint issues by incrementally building tests. This makes it easier to isolate the component that is causing degradation.
- Debug issues. Address bottlenecks, limitations or instability.
- Establish or validate service-level agreements (SLAs) for internal APIs or public services.
- Forecast a budget for the size and type of compute resources to support current or anticipated growth.
- Combine load testing with mock servers to further isolate your system under test from dependencies on shared resources.
With frequent load testing, either as part of a regular process or on an ad hoc basis, we understand our architecture’s performance characteristics deeper. This allows us to make performance improvements on our existing features and new implementations.
What are some tools for load testing?
Simulating load testing goes hand-in-hand with accurately measuring the performance of your systems. There’s a variety of widely-used, free tools for load testing web applications.
Loadtest is a free and open-source Node.js package that makes it easy to run load tests as part of your regular systems tests, before deploying a new version of your software.
Apache Jmeter is a free and open-source Java application for load and performance testing. It’s easy to configure and ideal for straightforward workflows. It has a Graphical User Interface (GUI) for creating and displaying simulations and supports distributed testing.
Gatling is another free and open-source tool for load testing. It has a slightly steeper learning curve, requiring Scala code for creating simulations as a single host.
Who is responsible for load testing?
Different organizations task different members of their team for load testing. In many cases, it’s the developer or tester that is responsible for measuring key performance indicators. In other cases, it’s the DevOps engineer, SRE, or someone in charge of application performance monitoring.
Although CIOs and engineering managers may not be involved with the implementation of load testing, they are ultimately responsible for infrastructure and downtime costs.
- QA engineers
- DevOps / SREs
- CIO / Engineering managers
Across organizations, there’s a variety of functions that bear the responsibility for performance testing. Across the development community, there’s a variety of opinions about testing infrastructure and best practices.
While a number of teams “do load testing”, what that translates to in practice is often varied. For some teams, they’re firebombing requests on non-production APIs to see what happens. That’s one way of getting started, but there’s still more that can be done.
Load testing can be expensive, and typically requires approval for the additional spend. No matter who is responsible for actually running a load test, demonstrating that you’re doing something valuable with the results of that test is important for the whole team.
What are API performance metrics?
These are some common measures of API performance. Depending on your service level objectives and requirements, you may want to measure other indicators too.
API performance metrics
- Response time — total processing time, latency
- Throughput — Requests Per Second, request payloads, maximum operating capacity
- Traffic composition — average and peak concurrent users
- Database — number of concurrent connections, CPU utilization, read and write IOPS, memory consumption, disk storage requirements
- Errors — handling, failure rates
Let’s take a look at one way to do load testing with Postman.
Recipe: Load Testing with Postman
We’ve seen how to use the Postman collection runner to run API tests and view your response times. What if you wanted to simulate concurrent users executing simultaneous requests? Since the collection runner is single-threaded, you would need to open multiple windows to run parallel collections.
We’ve also seen how to use the Postman CLI Newman to automate your testing.
To simulate concurrent users executing the same requests, you can use Newman to run parallel collections, as this example code demonstrates. You can also tweak your Operating System limits to allow a higher number of concurrent TCP sockets. However, your network hardware may become a bottleneck if you wanted to simulate more users.
Postman wasn’t originally designed for load testing, but our community has asked how they can use Postman to do it. If you have a solution using Newman for load testing, tell us about it in the comments below.
There’s a number of ways to load test your APIs. You can run tests on your local machine or scale up the volume with AWS EC2 instances. In this example, let’s scale up by using AWS Lambda to run concurrent functions with Newman.
Using AWS Lambda to run concurrent functions with Newman
#1 Set up monitoring and logging
Make sure your monitoring and logging is set up so that you can accurately measure and gauge your system’s performance. Ensure that you can monitor response time, recovery time, and other API performance metrics mentioned in the previous sections. Be sure to log results from both the client and server side, so you can better isolate issues.
Gather data so that you know the average and maximum traffic hitting your servers over a certain period of time. Identify specific times of the day and scenarios when you suspect your system’s performance might have issues due to load.
This data will inform what you decide to test and how you plan to increment the load.
#2 Decide what to test
Decide what questions you want to answer as a result of your testing. Use the information you gathered in the previous step to determine which simulations to run. For example, knowing the typical user workflow and which endpoints receive the majority of traffic will enable you to replicate a realistic workload with your tests.
If you’re just getting started, you might consider recording your traffic, dividing it up into smaller segments, and playing it back by gradually adding each segment.
In this example, let’s establish a test plan as a Postman Collection to ensure our APIs are behaving as expected.
- Add the requests that we would like to run, using folders to organize your endpoints and workflows
- Write assertions to validate that your APIs are returning the responses that you expect, within the expected timeframe
- Chain together requests to replicate test cases that include both happy and sad paths
Writing test cases that replicate common user flows will allow you to better understand the impact of incremental load on the user experience. If you need help getting started, check out these additional resources for writing tests in Postman or API testing tips from a Postman professional.
#3 Establish a baseline
Once your test cases are fully documented in your Postman collection, you can run the collection using Postman’s open source project Newman. We’ve seen how to use Newman to run collections from the command line.
Now, let’s use Amazon Lambda, a serverless computing platform that allows you to create and run a function without provisioning your own servers. It’s a Function as a service (FaaS) hosted on Amazon Web Services where you pay server fees only for the time that your function runs.
For this example, let’s start with this sample repo using Newman as a library based on this sample script for executing parallel collection runs.
AWS Lambda invokes your lambda function via a
handler object. A
handler represents the name of your lambda function (and serves as the entry point that AWS Lambda uses to execute your function code). Now that your logic is working as you’d like it, let’s restructure the code in such a way so that the function is called from within the lambda handler defined in
Next, we’ll create a lambda function deployment package, a .zip file consisting of your code and any dependencies. Add a script to your
package.json to zip the required project files, and then run
npm run zip from the command line. Upload the zipped file to your lambda function.
Invoke your lambda function by clicking Test. A single invocation of the function represents our baseline increment for testing. Observe the impact on your system and note the baseline standard for performance.
In this example, we are invoking our lambda function manually. However, you could trigger this invocation via webhook or more tightly couple this sort of performance testing with your CI/CD pipeline if you have one set up to test before releasing to production. You can also further automate this process of dividing your traffic into smaller segments and incrementing the load based on your custom preferences.
#4 Incrementally increase the load
Since we are running our script using single-threaded Node.js, you should determine an optimal Requests Per Second (RPS) that you’re sending while still maintaining accurate measurements. The configuration options allow you to allocate a specific limit of concurrent executions allowed to throttle your function.
By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 1000. However, you can request a limit increase if needed.
As you methodically introduce new load by invoking concurrent lambda functions according to your test plan, continue to observe the impact on your system. Incrementally increasing the load allows you to isolate the conditions which trigger a breakdown, and also to address any problems as they arise. Simulating dependencies using mock servers further allows you to isolate the components of the system under test.
At every stage of your load testing, you should inspect your aggregate results and drill down on logs when performance is not as expected. There are several custom reporters to output your Newman results according to your format preference.
Notice if the results are degrading in a stable, linear, or exponential fashion. A stable, linear trend allows you to more predictably anticipate performance degradation. An exponential trend will alert you that your systems are approaching a limit. The degradation will occur on a higher order of magnitude, and trouble is ahead.
These results will alert you to issues that should be debugged, and may guide the subsequent stages of load testing. In some cases, new questions will arise that require tweaking the exploration and informing the next round of testing.
A final thought on load testing
The development community has widely varying opinions about the performance testing of HTTP APIs. The tooling for load testing is also a fragmented landscape. There’s a ton of out-of-the-box, roll-your-own, and any combination of the two solutions. To top it off, load testing is frequently handled by different functions within an organization.
With all of these different approaches, it’s hard to identify any single strategy as the gold standard for load testing.
If you’re just getting started, I recommend starting with your monitoring. Observing your production traffic and corresponding performance metrics may be enough to spark some questions about your system’s behavior. Then, run a base case and start incrementally adding some load. Especially with complex systems that are dependent on other internal and external services, you’ll be able to spot the bottlenecks and and issues more clearly.
If you have a solution using Newman for load testing, tell us about it in the comments below!