Performance Testing — lessons learned
Previously in one of our articles, we mentioned why performance testing is needed and how it is being done in DCube. In today’s article, we’d like to share some of the common problems encountered and the lessons learned based on our experiences working with several product teams internally. Hopefully this article can help you succeed earlier and harvest the benefits of real-time insights and actions! 🎉 🎉 🎉
To recap, these are the four phases included in a performance testing:
Without further-ado, let’s dive right in~
Lesson #1: Having test plan walkthrough is important for team alignment
As the saying goes: “Plans are nothing; planning is everything.”
During the test planning phase, having a one page test plan itself doesn’t guarantee that it will be properly absorbed by all parties involved. It is important to have a walkthrough as a team and ensure that everybody’s understandings are aligned. Once everyone is aligned, agree on the test scope, timeline, and the support resources.
Lesson #2: Using dataset to determine realistic usage profile
This is one of the common pitfalls — while drafting the test plan, we are often unsure of the realistic usage profile under load conditions. To resolve this uncertainty, we should always use our past datasets (if any) or get datasets from other similar statistics applications (e.g. Google Analytics or Government’s own Internet-facing analytics application) to determine the highest usage pattern with 20% to 30% additional buffer. In the event there’s no such statistics to support, we have to factor-in some assumptions into the usage profile calculation.
Lesson #3: Having stable features before preparing for test scripts
Change is the only constant in Agile software development life cycle, but it is important to ensure the readiness of the features after the test scope is confirmed. A lot of preparation work has to be done to avoid test scripts/data issues that happen during test execution. For instance, most applications only allow a single login session. Thus, extra steps are needed to ensure each virtual user has a unique login account to avoid any conflict while running the tests. Hence by having stable features beforehand, it allows us to have at least one user performance testing early as what we shared previously. Additionally, it will be even better if we have the API specification that can help to speed up the test scripts preparation.
Lesson #4: Building scalable and reusable test scripts
When feature requirements are changed, our test scripts would also have to be updated accordingly. Oftentimes, we ended up rewriting some of the test scenarios. Hence, we should craft our test scripts in a way that they can be extended and rerun easily. This allows us to test continuously with confidence, instead of just to meet the test plan requirements.
Lesson #5: Having better estimates to preload test account/data
Product teams, especially those relatively new to performance testing, may underestimate the effort needed to prepare for and pre-load the test accounts / test data prior to the test. The larger the test datasets, the longer it will take to seed and preload these accounts/data into the database. This is why we recommend to include time buffers for seeding and preloading test accounts/data.
Lesson #6: Preparing a test environment mirroring the actual environment
It is ideal to prepare a test environment that is stable and free from any third-party integration (e.g. API calls from an external system) to obtain the most reliable and comparable test results. Well, isn’t the purpose of conducting performance tests to see how good our own application is without any other external factors? 😊
Besides, even if we know that there is any slowness with any of the third-party integrations, we won’t have the rights to make any changes to improve to meet the desired outcome. In the event there is a requirement to do so, seeking approval from the third-party application team is required. Otherwise, they may take it as a DDoS attack and all your test requests get rejected.
Lesson #7: Having proper logging and monitoring mechanisms
Application logging helps to detect errors and slow requests while monitoring provides the performance-related data we need — CPU and RAM usage, transaction passing rate, API processing time, network traffic, and/or even if there’s any memory leak! Of course, we’ll need to define the metrics that are important and relevant to collect during the tests. And you know what’s even better? If these metrics can also be captured when the application goes live, we can compare our test results against the actual usage! This will allow us to do further optimisation and estimation for future infrastructure capacity planning purposes!
Also, with the monitoring mechanism in place, we can also look into setting up the right alarm threshold to notify the product team of any sudden spike from the incoming load beyond our past test execution metrics threshold. The team will, then, have more confidence in setting up the proper alarm threshold in the actual environment.
However without these mechanisms in place, none of this could happen and we may not even be able to determine if the problem is in our application or the hosting environment.
Lesson #8: Setting up auto scaling policy
We recommend setting up the efficient autoscaling policy to test how it affects the application upon scaling in (decrease resources) and/or out (increase resources). Based on our past experience, Azure cloud platform may take more than 30 minutes to spin up a required instance to handle additional load to the application.
Lesson #9: Concluding the test execution
Now that we’ve finished running the performance tests, how do we decide to conclude the test execution? Well, improvements can only be made when there are measurements. Thus, we will typically be required to revisit the acceptance criteria defined in the test plan to fine-tune so as to obtain satisfactory and consistent test results. Unless of course, all your test acceptance criteria can be met without any fine-tuning. But come on~ we all know this is super hard to achieve! 😉
Lesson #10: Making test execution easy to trigger for any engineers
Like functional testing, performance testing should be everyone’s responsibility in the development team. To make it easy to be triggered for all, we recommended doing so via a Continuous Integration tool rather than running via a local machine. Something to note - if you plan to run large-scale tests with more than 500 virtual users, there is a need to do OS-level fine-tuning on the test agent.
Hopefully by reading through the above information, it can help you to avoid some of the problems and increase your awareness towards building applications with performance in mind. This will not be the end of our learning and we’ll keep improving ourselves to move towards the performance engineering journey! 💪💪💪
- Merlin 💛