Smoke testing in the cloud

Published in

Akamai Krakow Blog

7 min readJan 9, 2019

TL;DR Automated load testing of systems should be a part of the Continuous Integration pipeline process with the goal of identifying bottlenecks ahead of a new release deployment. This article demonstrates a simple yet powerful open-source load testing framework that can become the foundation of your distributed performance and smoke testing platform.

As we move more and more applications to cloud based infrastructure, effective and proper testing becomes more challenging than ever.

One aspect of such testing is how a system behaves in a distributed scenario under different loads. The testing environment should be a functional clone of the production one, usually with lower capacity, but with the same components network and CDN. This is important as dissimilar environments can too easily lead to the wrong conclusions being drawn. Testing against production-like infrastructure ensures you find real bottlenecks in the system early, identify the maximum capacity, and perform optimization work without impacting the public network.

Real vs. virtual users

Let’s assume that our staging environment is ready, but there is one missing part: users. Ideally, hundreds or thousands of users who generate traffic. As mentioned earlier, for various reasons, we don’t want real users to operate in the staging environment. Therefore, we need to generate the traffic ourselves. Ideally, each of our virtual users should behave similarly to real users. This means that performing one action only, such as accessing the homepage, is not sufficient and our virtual traffic should be more complex:

- go to the homepage
- wait a few seconds (10s-20s)
- click Products
- wait
- go to a random page of product list
- wait
- click Product X
- …
- end of session

The above flow shouldn’t be just a fixed list of steps. When a human being browses a website, the intervals between going to sub-pages are different and the number of opened sub-pages varies. These factors also need to be considered in our virtual user’s journey.

Additionally, the number of virtual users should be configurable or — even better — adjustable during the testing session. We can start with 10 users, make some measurements, then ramp up to 1000 of users and check the metrics again.

We should not forget that real users live in different locations, so our virtual users should also be distributed.

Which load-testing tool to choose?

There are many good load-testing tools available on the market, open-source as well as enterprise solutions. At Akamai, we offer CloudTest to enterprises as a comprehensive end-to-end solution for performance and load-testing. However, for my little project I wanted to try out something smaller and open-source. I found a few great projects like JMeter, Gatling and Locust, and after doing some research I decided to go with Locust (https://locust.io/).

Locust is an open-source performance testing framework that enables you to define user behavior using pure Python. In addition to its “test as code” feature, Locust is highly scalable due to its fully event-based implementation. Additionally, you can either run it on a single node, or in distributed multi-node/multi-region environments.

Let’s have a closer look at how Locust works and how to use it.

Locust — single instance

This is the easiest and the quickest way to get Locust up and running. The only requirement is Python, which is available out-of-the-box on Linux/Unix platforms and can be installed on other platforms such as Windows.

I start with setting up a virtual environment:

❯❯❯ virtualenv locust-single
New python executable in /Users/lczerpak/temp/blog/locust-single/bin/python2.7
Also creating executable in /Users/lczerpak/temp/blog/locust-single/bin/python
Installing setuptools, pip, wheel…done.
❯❯❯ cd locust-single
❯❯❯ source bin/activate

and once it’s ready, locustio can be installed:

(locust-single) ❯❯❯ pip install locustio
…
Installing collected packages: greenlet, gevent, itsdangerous, Werkzeug, MarkupSafe, Jinja2, click, flask, idna, urllib3, certifi, chardet, requests, msgpack-python, six, pyzmq, locustio
Successfully installed Jinja2–2.10 MarkupSafe-1.0 Werkzeug-0.12.2 certifi-2017.11.5 chardet-3.0.4 click-6.7 flask-0.12.2 gevent-1.2.2 greenlet-0.4.12 idna-2.6 itsdangerous-0.24 locustio-0.8.1 msgpack-python-0.4.8 pyzmq-16.0.3 requests-2.18.4 six-1.11.0 urllib3–1.22

My demo site is based on Magento (https://magento.com/), a popular open-source e-commerce platform. Magento has an underlying database and can be deployed to a cluster, which provides flexibility to emulate most commonly used web architectures and thus makes the platform perfect for testing activities.

The following script written in pure Python implements a sample virtual user behavior:

Based on the above code, Locust visits the defined pages in random order but takes into account the assigned weights (in @Task annotation). The pages with higher weight are visited more frequently than the ones with lower weight. The requests are sent with a random delay ranging from 4 to 8 seconds. Additionally, when “/about-us” is visited, the cookies are cleared, which denotes the end of the web session. All of this should give us some sort of randomness and uniqueness for each virtual user.

Given our website’s URL is http://my.demo.com/, the following command will bootstrap Locust framework:

❯❯❯ locust -f locust-script.py — host http://my.demo.com

At this moment, no tests are running. Locust started web monitor that should be available at this default URL: http://localhost:8089

Now it’s time to start the show and check how the website behaves under different loads.

In order to check whether everything works fine, we can run a simulation with one virtual user. You should see real-time stats immediately after starting the test:

Since it’s only one user, it takes a bit more time to visit more pages. However, after 1–2 minutes, more pages and stats are presented:

Apart from that, Locust shows you Total Requests, Average Response Time and Number of Users on separate graphs where you can see historical values and compare them easily:

The following graphs illustrate my testing procedure. I started with 1 user, then increased gradually to 300, and then rapidly switched to 600 users:

As you can see, the system was working without any performance degradation for ~500 of users which translates to ~80 req/s. Once the number of users reached 600, the system started to respond significantly slower. Such feedback can be confronted with expectations to see whether our prediction was correct, or if Locust helped us find a potential bottleneck in the system.

Headless and API-driven testing

Although the Locust UI is amazing, there are also other ways to manage and control it via API. This opens doors for integrating with the Continuous Integration pipeline to create fully automated performance and load testing.

Here is a sample API call to start testing with 100 users:

❯❯❯ http — form POST http://localhost:8089/swarm locust_count=100 hatch_rate=50
HTTP/1.1 200 OK
Content-Length: 48
Content-type: application/json
Date: Mon, 20 Nov 2017 16:26:45 GMT
{
    "message": "Swarming started",
    "success": true
}

and to stop:

❯❯❯ http http://localhost:8089/stop
HTTP/1.1 200 OK
Content-Length: 44
Content-type: application/json
Date: Mon, 20 Nov 2017 16:27:44 GMT
{
    "message": "Test stopped",
    "success": true
}

Locust — distributed testing

It may turn out that running tests from a single machine is impossible for a really huge number of users, or when it’s more important to see performance from geo locations different than those under heavy load.

In such case, Locust can be set up in the distributed mode. It requires at least two machines, one working as Master and one as Slave. Obviously, it makes sense to have at least 2 Slave machines or more, depending on target traffic levels:

To start Locust in the Master mode, add the — master flag:

❯❯❯ locust -f locust-script.py — master — host http://my.demo.com

On each Slave machine, run a similar command with the — slave flag and the Master IP address (there is no need to provide the URL):

❯❯❯ locust -f locust-script.py — slave — master-host=127.0.0.1

Once all three Slave nodes are connected, we can see them on the Master node in the UI:

In the distributed mode, all interactions with the UI and the API are done against the Master node and work in the same way as in the single instance mode.

Summary

This article doesn’t exhaust all Locust features and scenarios in which it can be used. It’s a very simple yet powerful tool you can use for building your own performance and smoke testing platform. Features like the access through API and the distributed mode combined with an orchestration tool like Terraform can move Locust to the next level. But let’s discuss it next time ;)