How We Measure Pre-Deployment Response Time

Published in

Trendyol Tech

6 min readMay 20, 2024

As a search team, response time has been one of the primary challenges we have faced. In this case, we always strive to anticipate whether response time will be affected and how it will be affected. Unfortunately, there are times when we may mispredict this impact and only later realize the unexpected negative effects. In fact, we encountered two different incidents due to this reason.

To give an example, the average RT on the application’s homepage service, which was typically 90 ms, spiked to on an average of 1.5 second. As a result, users encountered timeout issues when trying to access the application.

When reflecting on why we couldn’t foresee this incident beforehand, we realized that it was not possible to notice it in the development environments(Stage etc.). It also was not possible to detect it with any acceptance test or integration test project.

When considering how to detect this without impacting users, we came up with this project, response-time-checker. We defined the main purpose of this project as checking whether each endpoint surpasses the predetermined response time threshold on the pre-production environment.

As we brainstormed about the project, we outlined three different phases to keep the initial effort minimal. We have completed the first two phases. In this article, I will discuss how the project operates, delve into these phases, and highlight the benefits.

Phase One

In our project, we actually needed to generate a small amount of traffic for the endpoints. However, we could not create as much load as we wanted because the project would run on the pipeline, so we needed to consider pipeline duration. Therefore, we used static data in our project, and the endpoints obtained their data needs from other endpoints that were pulling live data at that moment.

In the first phase, we only considered advancing with a specific threshold check. Therefore, it was sufficient for us to take the average response time of the requests and check it exceeded the threshold. At the same time, we added small status code checks; we should not see 500, 400 and 404 status codes.

Implementation

When creating the project, we utilized following techniques and technologies;

Python 3.10
Behave 1.2.6
BDD

We chose Python for development due to its ease of use and the possibility of using the locust library for future phases. Our decision to proceed with Behave and BDD was drive by the ease of implementing new endpoints and the ability for each endpoint behave like a user when necessary.

How It Works?

As seen in the sample scenario, the load and checks consist of different steps. First, we send warm-up requests in the background, then we start sending requests to the relevant endpoint. Finally, we calculate the average of response times, and checks based on the given threshold value.

As seen in Graph 1.1, we were able to successfully create a small-scale TP for the service. While increasing the TP value has its advantages, it also comes with disadvantages. The advantage is that you can obtain more precise results, but it also results in longer pipeline durations.

Results

After the implementation of the first phase, we observed the following;

Benefits

It autonomously streamlined manuel pre-production checks
It only accepted successful status codes, it allowed us to deploy more confidently
It saved us similar incidents twice

Improvements

Execution time was approximately 5 mins, which slightly prolonged the pipeline duration
It was working with static data, consistently performing searches for burgers or sending requests for the same location. This affected the accuracy or proximity to production.
Checking the threshold was not enough, we should also catch the small response time increase

Phase Two

In this phase, we considered how we could make improvements similar to those made in the first phase, and we took action at this point.

We updated the project’s operational logic from Graph 1.2 to Graph 1.3 as follows;

Graph 1.2 (Represents Phase 1 Operational Logic)

Graph 1.3 (Represents Phase 2 Operational Logic)

Initially, we are capturing incoming requests to our service using a Kafka consumer and indexing them into Elasticsearch.

We considered using the requests stored in this index to generate load in the project. Stored data example like below;

Since these request come directly from production, using them ensured that the response time values were more accurate. Therefore, we wrote elastic queries within the project to retrive the request data needed for each service from Elasticsearch index, and we removed the static data.

As a result, the project is now able to use the most recent requests made for each service every time it runs. Also, we achieved diversity in the requests.

In addition, the project was running after the deployment of the pre-production environment (Graph 1.2), but we made the project parallel to the acceptance test project (Graph 1.3). In this way, we managed to avoid increasing the pipeline duration.

Finally, only the average response time of the requests was checked aganist the threshold value in the first phase. In Phase 2, we performed a one box deployment that is parallel to stage deployment. One Box deployment has the same pre-production configurations, but the only difference is that it has the new development changes compared to pre-production deployment.

Think of it this way; there developments that will go production within the one-box deployment while pre-production mirrors the production environment. This allows us to compare the performance of the endpoints by generating traffic at the same level in both environments.

As seen in the example scenario, we are now able to control increases in endpoint-based percentages with the threshold values we provide. Thus, even in cases where we do not anticipate an increase in response time, we are provided with the opportunity to detect any such increase.

After implementing the improvements mentioned above, we brainstormed further on how we could enhance the system. At this point, we identified several items that we can address for Phase 3.

Taking all of this into account, the results are as follows:

Benefits

We are now able to detect any impact on the feature response time in terms of percentage increase
The pipeline duration has been reduced compared to before
We started using requests closer to production environment data with using event data that are from real users’ requests. Consequently, we obtained comparison results that we are more confident in terms of accuracy, making our tests more valid

Improvements

Running the project on the feature pipeline instead of the development pipeline might be more beneficial
Creating load using user rate and hatch rate values instead of request count may be more efficient
Besides status code checks, additional general checks specific to each endpoint can be added

Conclusion

Taking all these into consideration, we can now deploy while being aware of increases in response time. Moreover, we have eliminated the manual effort involved in pre-production checks. In addition to deploying more confidently and autonomously, we also compare response time differences with using production data. I believe that the system will become even better with the improvements will be made in Phase 3. I also anticipate that it will help prevent potential human errors in the future. I agree that the system will improve with the planned enhancements in Phase 3. Furthermore, I believe these advancements will help prevent potential human errors in future iterations.

As Maya Angelou wisely said,

‘Do the best you can until you know better. Then when you know better, do better.’

Thanks for reading.
Oğuzhan Erdem.

About Us

Would you like to be a part of our growing company? Join us! Check out our open positions and other media pages from the links below.

Home - Trendyol Careers

We believe in the power of an inclusive workplace. Our platform is for everyone, and so is our workplace. Each and…

careers.trendyol.com