Performance test for beginners part 2: common mistakes
Prelude
Second part of the journey: the mistakes. As we ‘re started to enter a new domain, it ‘s easy to make mistakes, or I must say that it ‘s unavoidable (at least in my case, where I must start from zero, and learn things the hard ways). Furthermore, perf test tools ‘re like powerful weapons; if you lack strength and experiences, which is obvious for beginners, you may damage yourself (maybe others also) while wielding them. So, in this blogpost, I will list common mistakes that I learnt the hard ways, the mistakes that I was aware from start, and the ones that pointed out by my colleagues — with hope that you could avoid them, to make your journey safer, successful and fun.
Common mistakes
Testing tool faith: As a beginner, I used to blindly trust my test tools. Imagine you ‘re firing a target using a machine gun, the gun could heat up, decrease its precision and firing rate. Or you may force it fire more bullets than it could, and make it blow up to your face — in testing world, it ‘s called overloaded the load generators. These kind of mistake ‘s hard to avoid, especially for beginners. However, you could avoid the mistake by observe the test process using monitoring tools, not just monitor the applications only, but also the generator (test tool and its hardware).
NOTE: In many cases, the problems in application are easier to identify from the system and OS level.
Ignoring errors: just because the tool produces a result doesn’t mean the result reflect a successful test. You may see this statement so obvious at first: how could I ignore the errors in a test result? Just think about how a perf test may be use, then. A lot of perf tests ‘re implemented into CI/CD flow, which make sure the application must not be degraded after changes, at least must meet a standard needed for business; for examples, 99% of requests must have response time less than 200ms. While the test result satisfy the condition, you didn’t know that all the response status is 404, instead of 200 because the test script only parse result to check the response time, but ignore the response status. And faulty app pass the test, be deployed on production — It is disaster.
Load test duration short: To ensure the correct results, the test duration should be long enough. But how long is enough? We argued a lot about this (we mean not just system admins, but also developers), and I even see some test scenarios only has duration 20s. While it doesn’t have a fixed number, as each application differ by their language, their design, their backing services and the hardware they running on — In my experience, 10 minutes is enough for our services; It ‘s a good number in our case, not too short to ignore faults in applications and system design, not too long to affect deployment flow. You may find another number for your applications and system, another sweet spot. But don’t set it too short — Try out by 30 minutes first, and reduce the number after.
Hard coded data in every request: may lead to invalid performance test. Applications today have caches everywhere: from nginx cache at frontend, cache created by application itself, to cache in backing service like redis, even some databases has their own cache for queries. So, sending a request repeatedly doesn’t reflect real performance of a service because the responses may be cached somewhere. We should try a worst case scenario where each request is unique and doesn’t duplicated. To do that, you may want to ask developers for a list of request contents, and generate requests based on it. The list should be big enough, at least equal to total number of requests will be send out during the test.
Test without monitor system resources: the test result itself won’t reveal where bottleneck is in the system or which components cause the problems. One should be monitor the system resources while performing perf test, in order to locate the bottleneck. Even in case of the test result is good, you may still discover problems with application, for example if the response time is good but the resources (RAM, CPU, network bandwidth …) the service consumes is double. You may also discover the potential bottlenecks which you may encounter in future.
Test without knowledges about tested application and system: Knowing what the applications do, which backing services they use, where they store data, where the logs ‘re located … help a lot while performing perf test. An experienced performance engineer could point out where the bottleneck is solely based on those information alone even without performing the test.
Tips and Tricks
Beside the mistakes, there ‘re some tips that could save the day, or lessen your work. As I said at part 1, we mainly doing load test at our company, many of the tips and tricks here focus on load test.
Test the request manually first: fire a single request manually, get it response status and response time could help you roughly estimate the result of the test. If the response is 4xx or 5xx, obviously you don’t need to actually perf test it. If the response time is greater than 1s, don’t take the test either, because usually those slow responses indicate slow queries in databases, or big files needed to load; you may break some backing services if you performing the test.
Stop the test when it fail: when the error rate or slow responses time is high enough to fail the test, terminate it if you only want to know if the service meet requirement or not. It will save you time and don’t put more pressure on backing services.
Slowly raise the workload: it ‘s better to increase the workload overtime, instead of constantly sending a big number from start. Because some applications, like ones written in Java need time to warm-up; additional, you may find the saturation point (the point which performance of service is started to degraded) much lower than the business requirement.