Creating a load testing framework

Dominic Humphries
Adzuna Engineering
Published in
4 min readNov 23, 2021
Photo by Nicolas Thomas on Unsplash

We use the Solr search platform to provide some of our services at Adzuna. Earlier this year, we went for an upgrade. Everything was dutifully tested in our staging environment and looking good. So we rolled out the upgrade to live.

And that’s when we hit an issue. Everything worked.. slowly. The one thing our manual testing hadn’t — couldn’t — check was the performance we got under load. Which in this case, wasn’t as good as we needed.

Getting everything sorted out took days and cost quite a bit in larger instances to counter the degraded performance. Enough was enough: Before we tried to upgrade again, we needed to be able to test the proposed upgrade under realistic loads to make sure it all worked.

After some looking around, we settled on Vegeta as our test framework of choice: capable of making HTTP requests at any desired rate, it was a perfect match for our needs. Now we just needed to give it the input that would enable it to make valid tests.

No need to over-complicate this, surely? Just take production requests and re-use them for tests, right?

Yes… where do we keep all the production requests though?

Aha! Logstash! No problem then, we can just get the data we need from elasticsearch. Simple…

First, because our indexes are timestamp-based and therefore dynamic, we’d need a list of available indexes:

curl -s $logserver:9200/_cat/indices | grep “$index_match” | awk {‘ print $3 ‘} | sort

Then we could simply retrieve requests for a given Solr core for a given time window with a Curl command as simple as

curl -s -XPOST -H ‘Content-Type: application/json’ “$logserver:9200/$index/_search?size=$max_window&from=$i” -d ‘{
“query”: {
“bool”: {
“must”: [],
“filter”: [
{
“bool”: {
“filter”: [
{
“bool”: {
“should”: [
{
“match”: {
“solr.collection”: “$core”
}
}
],
“minimum_should_match”: 1
}
},
{
“bool”: {
“should”: [
{
“match”: {
“solr.level”: “INFO”
}
}
],
“minimum_should_match”: 1
}
}
]
}
},
{
“range”: {
“@timestamp”: {
“gte”: “${date}T${from_time}:00Z”,
“lte”: “${date}T${to_time}:59Z”,
“format”: “strict_date_optional_time”
}
}
}
]
}
}
}’

Nothing to it!

That did return rather more data than we needed though, so we piped the results through faithful old JQ to make it more manageable, appending

| jq -r .hits.hits[]._source.message

To the curl. We now have the logged requests from live, ready to be reprocessed into HTTP requests for Vegeta.

Unfortunately, each line looks something like this:

2021–11–16 12:01:55.191 INFO (qtp751021317–316117) [ x:jobs_AU] o.a.s.c.S.Request [jobs_AU] webapp=/solr path=/select params={_qtags=30705|Domain::Term|221|aBU7u9RG7BGog11HcKxilA&qs=3&facet.field={!key%3Dlocation:id}location_struct&facet.field={!key%3Dcategory:id}category_id&facet.field={!key%3Dsites}site_id&ps=2&qt=edismax&f.category_id.facet.limit=3&start=0&f.location_struct.facet.limit=-1&rows=1&q=”aviation”&f.location_struct.facet.mincount=1&f.category_id.facet.mincount=1&pf=title&f.site_id.facet.mincount=1&wt=json&facet=true&f.site_id.facet.limit=6} hits=838 status=0 QTime=3

Which is more data than we need, and what we *do* want is in the wrong format. We need to mangle this data into a more useful shape! This is a job for Perl!

No, seriously. This isn’t big enough to need to write a formal parser, but it’s big enough for a regex to be horrific to write and maintain. This is exactly what Perl excels at!

Firstly, we need to break the line up into sub-parts. Perl allows for regex components to be put into variables, and also for each component’s capture to be placed into a named register. So instead of one huge regex definition, we can define many sub-parts and combine them:

my $space = qr/
\s* # Match any spaces
/x;
my $date = qr/
[-0–9]+ # Match a string of numbers and hyphens
/x;
my $time = qr/
[:.0–9]+ # Match a string of numbers, colons and periods
/x;
my $state = qr/
\S+ # Match a single word, this’ll generally be ‘INFO’
/x;
my $junk1 = qr/
\([^)]+\) # Match unwanted string, in parens
/x;
my $junk2 = qr/
\[$space[^]]+\] # Another unwanted string, this in square brackets
/x;
my $oasc = qr/
o\.a\.s\.c\.S\.Request # Another unwanted string, ‘o.a.s.c.S.Request’
/x;
my $core = qr/
\[(?<core>[^]]+)\] # Named capture of the core, which is in square brackets
/x;
my $param = qr/
(\s*(\S+=\S+)+) # A param is a ‘key=value’ string
/x;
my $params = qr/
(?<params>$param+) # Named capture of all the $param matches
/x;
# Utility regexes made up of previously-defined termsmy $timestamp = qr/$date $space $time $space/x;my $blurb = qr/$space $junk1 $space $junk2 $space $oasc $space/x;

With all of these components declared, we can now create our line-matching regex that’s considerably more readable than the average regex:

my $re = qr/$timestamp $space $state $space $blurb $space $core $space $params/x;

The only parts we really want are $core and $params, which are available in the special variables $+{core} and $+{params}

A little more data munging to get it into the right forms, and we can output a Vegeta-comprehensible line with:

say “GET http://” . $host . $results->{webapp} . ‘/’ . $results->{core} . $results->{path} . ‘?’ . $results->{params};

For every request in the Elasticsearch output, this generates a corresponding Vegeta HTTP request. Output to a file, we now have all the data we need for a load test of the staging server.

vegeta attack -rate=${rate:-50} -duration=${duration:-1m} > $RESULT_FILE

With a small shell script acting as a wrapper to give us sensible defaults (50 requests a second for a 1 minute test), we can pipe the Perl-adapted output of Elasticsearch into Vegeta and get a nice comprehensive output in return telling us how many valid responses we got, how long they took, etc.

All of which will give great peace of mind the next time we come to prepare for an upgrade…

--

--