Running Load Testing at scale on shoestring budget

Mahfudh Junaryanto
cloudstory
Published in
11 min readAug 2, 2020

--

This the first part of Performance Management article series. Pandemic has changed everything, and now it is more important than ever to be cost-conscious so we can endure the difficult times while realigning our business strategy. Coincidentally, Performance Management is about two sides of the same coin. On one side how we deliver delightful experience to our users, and on the other side how we do so with minimum cost.

This article will walk you through the journey, how we come to the architecture decision, followed by different deployment models (architecture) and hands-on implementation. If you don’t have much to spare or not interested in the discovery process, just jump to The Architecture section or The Implementation section.

Decision Criteria

What does it mean to run load testing at scale on shoestring budget? We extracted two keywords: ‘at scale’ and ‘on shoestring’ to establish criteria to make decision:

  • load generator efficiency, highly efficient load generator ensure we can generate load at any scale, with lower infrastructure cost per thousands RPS (request per second)
  • developer experience, better developer experience allow developer to complete the test script faster, or allow less skilled (aka cheaper) developers to complete the script for any complex scenarios
  • reporting, better reporting provide better visibility to the target system behaviour and allow us to overlay with system metrics for performance optimisation

Which tools to use

While searching for what tools to use for our purpose, I stumbled upon an article on the very topic we are discussing: https://k6.io/blog/comparing-best-open-source-load-testing-tools. It’s a super comprehensive analysis that pits all major load testing tools against each others.

Let’s extract the key points that relevant to us here:

  • The top 3 most active (probably most popular) are: JMeter, Gatling and k6
  • JMeter surprisingly has maintained strong foothold despite its well-known inherent efficiency (1 thread, 1 VU) that make it resource hungry and costly to run
  • Gatling that was designed squarely to address JMeter performance and developer experience issue, predictably has gained more trust
  • k6 the new rising star, and the fact that Gitlab has migrated to k6 for its automatic load testing pipeline, speaks for itself why people love this product
  • Many of tools listed have much higher RPS then the rest of the pack (Wrk, hey, etc), but we need to disqualify them because they are a simple fire and forget HTTP generator that will be too challenging to simulate real-world scenarios

Thanks to that article, our job is much easier now. Let us decide between JMeter, Gatling and k6.

JMeter is a solid tool, but it’s not the most efficient in generating load. It means, it costs more to generate the same load due to the need of extra hardware. It’s also the least developer friendly. You must use the user interface to configure and create scenarios. Even if you have legacy and JMeter talent pool, you should probably migrate away from JMeter. Higher development cost, and higher infra cost. We will eliminate JMeter at this round.

Developer Experience: Scala vs Java script

Between Gatling and k6, they are both very developer friendly, but with different language support: Scala for Gatling, and Javascript for k6. With popularity and availability of Javascript developers, it seems to be an easy decision. But please take note that they provide recording tool that eliminate the need of manual scripting. It can be used to bootstrap our script, even though total elimination of scripting is more a dream rather than reality. Comparing the recorder between both tools, Gatling preferred approach is using proxy. But from experience, setting up proxy is not straightforward, most of the time you don’t get it right first time. I relinquish to use HAR (Http Archive) recording approach, which is very convenience and natural way to record web interaction, and it’s available in every browser. With HAR file, you then use converter provided by the tool to turn it into test script.

Some takeaways in using HAR recording approach:

  • Prefer Firefox over Chrome. In Firefox, we can easily filter which traffic to records. Most of the time, we just need page request and xhr for load testing, this will make our test script clean and manageable
  • Gatling converter does not convert post requests, which puzzled me.
  • k6 converter is doing great job in conversion. It apparently translates every requests along with all the headers and cookies. But unfortunately, this sometime makes request failed with weird “protocol error”. So the workaround is to remove all headers (except “content-type”)

With this, I would pick k6 over Gatling.

Load Generation Efficiency: Scala vs Go

Between Gatling and k6, they are both very performant. Gatling developed on top of Scala, while k6 is based on Golang, both are known for better concurrency and scalability.

Referring to the above article, Gatling perform slightly less efficient, but that probably due to short running time, that Scala need more time to ramp up. I would not pick side on this area.

Reporting

Reporting probably does not have direct contribution to cost-savings or efficiency but it’s still very significant:

  • allow us to overlay with target system metrics for better visibility and troubleshooting
  • allow for real-time monitoring and team collaboration during performance tuning
  • impress our boss with better visualization and presentation

Inline or console report output is not sufficient, it won’t meet any above criteria. Luckily, performance testing has very well-defined metrics that all major tools will easily support. Our Industry also has standard protocol for metrics collection like collectd or statsd. Once the tool provides output in one of those protocols, we are safe. We can then simply channel out testing metrics in real time to the monitoring tool of our choice (DataDog, Grafana, etc).

As you may have guessed, both Gatling and k6 satisfy the requirements. In Gatling, you will need to use graphite plugin with statsd mode, while in k6 you just need to specify statsd as output.

Again, I pick no winner in this area.

The Architecture

We have picked k6 as our tool for load testing. Now let’s establish architecture that allow us to deploy our tool and reporting solution in our infrastructure in cost-effective manners. We will present 3 different deployment models to accommodate different needs. We will be using AWS Cloud for our deployment. In the absence of 3rd party monitoring tools (other than CloudWatch) here are the 3 possible setups for our load testing:

Cost-effective Solution

Option 1. CloudWatch Dashboard

This is the most cost effective solution that you could possibly have. Especially if your workload is already in AWS, you just need to spin off compute instance for k6 , you’re good to go. This is also the most scalable, and easiest to setup, since it relies on AWS managed-service to store the metrics and to visualize the results. CloudWatch is not a time-series database, but it allows you to store metrics. Just be mindful of its limitations, it rolls up the metrics which make it practically useless after 63 days (<60secs available for 3 hours, 1 min available for 15 days, 5 min available for 63 days, 1 hour available for 15 months).

Pros:

  • super cheap
  • easy to setup
  • scalable and highly available (HA) by default

Cons:

  • Metrics granularity degrades overtime
  • Not possible to overlay with metrics from 3rd party tools

Pro tips: Use this solution if you want to run one-off Load Testing without worrying about trends.

Advance Dashboard Solution

Option 2. Grafana Dashboard

This solution will allow for integration with 3rd party metrics. During performance tuning, it’s not unusual to overlay several metrics in single dashboard to see the impact of load to CPU, memory or network behaviour in real time, to see how efficient your app handles the load. This also comes handy if you already have Grafana setup in your system.

Pros:

  • Allow integration with 3rd party metrics

Cons:

  • Complexity to setup and run Grafana
  • Metrics granularity degrades overtime

Pro tips: Go with this solution if you want to integrate with 3rd party metrics

Metrics Retention Solution

Option 3: InfluxDB — Grafana Dashboard

This solution fix metrics retention issues by having dedicated database to store metrics. But it’s also the most complex to setup and maintain. InfluxDB PaaS isn’t cheap, and setting up own InfluxDB in high-availability (HA) requires some complex replication. But as trade-off, you can setup single instance InfluxDB and keep the backup after running a test. Shutting down the database and Grafana while not in-used can also keep the cost low.

Pros:

  • Metrics retention as long as you like

Cons:

  • Complexity to setup and run InfluxDB and Grafana

Pro tips: Go with this solution is you are running continuous performance management, or you already invested in setting up InfluxDB and Grafana. Alternatively, you can send the metrics to DataDog that allows you both store and visualize metrics at a very affordable rate

The Implementation

Let’s roll up our sleeves and implement the solution. Aligned with cost-conscious spirit, we will be taking the first solution through the simple tutorial below.

We will be using ec2 instance to run the k6, since this allows us to use CloudWatch agent to ship metrics to CloudWatch backend. One thing that I particularly like about this solution is the simplicity. You do not need anything else other than k6, the ec2 instance and AWS CloudWatch. We are leveraging on CloudWatch agent supports for statsd.

k6 claims capable of generating 30K-40K users (~300K RPS) in single instance, which is the case for most of our projects. If you are interested in preparing k6 for massive scale, visit their blog here

Pre-requisites:

  • AWS Account
  • ec2 instance running Ubuntu/Debian

If you have other flavour of Linux, go over to k6 installation page for instruction, and jump to step 2 below.

Step 1: Install k6

Login to your ec2 instance and run the following:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 379CE192D401AB61
echo "deb https://dl.bintray.com/loadimpact/deb stable main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get install k6

Test the installation with:

k6 version

Step 2: Install and Configure CloudWatch Agent

Install CloudWatch agent if you have not done so.

Download the package:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/arm64/latest/amazon-cloudwatch-agent.deb

Install the package:

sudo dpkg -i -E ./amazon-cloudwatch-agent.deb

Create a configuration file to run agent as StatsD server listening on port 8125 (default port):

{
"metrics":{
"namespace":"perftest",
"metrics_collected":{
"statsd":{
"service_address":":8125",
"metrics_collection_interval":5,
"metrics_aggregation_interval":0
}
}
}
}

Save the file as /opt/aws/amazon-cloudwatch-agent/etc/statsd.json. Little explanation on the configuration:

namespace: <anything you like project-id, etc>
service_address: <port to listen to>
metrics_collection_interval: <the frequency in secs to send metrics to server>
metrics_aggregation_interval: : <the number of seconds before it aggregate metrics>

We set metrics_aggregation_interval to 0 because we want the agent send raw data points rather than aggregated data.

Run the agent as service with StatsD configuration:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/statsd.json

Step 3: Prepare test script

We are going to load test PerfTest Cloud website, with the following scenarios:

  1. Open the page at https://perftest.cloud/
  2. Click on the Early Access link to open Early Access Program page
  3. Key in email address and press submit button

Now we are going to record the session using Firefox and export the HAR file. Before you browse to the site and execute steps above, ensure that you have enable Persist Logs and filter the request to HTML and XHR

Once recording done, export HAR by selecting Save All as HAR

Run converter to get the test script:

k6 convert --no-batch --output perftest.js perftest.cloud.har

Open the script and make small adjustments.

For every POST request, remove all headers, but the Content-Type:

"headers": {
'Content-Type': 'application/json'
}

Put think time in between requests. Since the user opening 2 pages before submitting POST request, let’s put 10 secs (5 secs each page). Please note that, if your app is not Single Page Application (SPA), you simply put 5 secs in between page request. But since our app is an SPA, there is no new request submitted in opening second page.

sleep(10) 

k6 does not provides Error rate in the standard metrics. But we can easily add one via custom metrics:

On the import area add this:

import {
Rate
} from 'k6/metrics';

After request line, add the check line:

failed_rate.add(res.status !== 200);

It reads: add failed rate if the response is not equal to HTTP 200.

Test the script by executing:

k6 run perftest.js

It will run a test with 1 virtual user (VU).

Step 4: Run test script

The script is now ready to run with desired load from the ec2 instance. For running with 100 VU for 60 secs, execute this:

k6 run --vus 100 --duration 60s   perftest.js --out statsd

In real life, we give our server ramp up time. We can include load pattern in the code itself. For example, to give 30s ramp up time and 30s cooling time:

export let options = {
stages: [
{ duration: '30s', target: 20 },
{ duration: '4m', target: 50 },
{ duration: '30s', target: 20 },
],
maxRedirects:0
};

You can create more complex load pattern using non-standard executor. Please refer to k6 documentation for detail

Step 4: Setup dashboard

Finally, we reached the final step, and also the fun part! Login to your AWS console, and go over to CloudWatch. Click on Metrics in the left menu, and look for Custom Namespace:

Filter the metrics to only show data from k6:

Now, you are ready to create dashboard, and add Metrics panel to the dashboard. I was surprised by how easy to prepare dashboard in CloudWatch, especially on customizing the metrics panel. To me it’s even easier to Grafana.

For example to change legend and ordering:

or changing legend color:

But the best part is really when rearranging and resizing the panels in the dashboard. Give it a try.

The only drawback is that CloudWatch does not have data export for Metrics. When you have the requirement to report raw data, CloudWatch is out the option.

Finally let’s have a peek at our dashboard:

Setting up the dashboard is a little more involved. If you just want to quickly see the result as shown above, leave a message below, I will be happy to share the source code that you can import into your dashboard.

That’s the end of the article. If you have cheaper solution to run Load Testing, please share , I am happy to hear it from you.

Meanwhile, for more comprehensive cloud optimization do check our website: https://cloudops.services

--

--