Arista: How Trendyol Gained Hours In Load Test Data Preparation

Kerem Can Kabadayı
Trendyol Tech
Published in
5 min readJan 16, 2024

Introduction

In most of the load test scenarios, APIs need specific data to perform actions. Creating data is a time-consuming process. Also, it requires repeated action to gather new data. All of these processes tire people out and take up their time. Load test data creation requires a systematic approach, including identifying data sources, verifying accuracy, and producing practical data that replicates real-world usage. High-quality data is crucial for precise results, and refining the data set can significantly improve accuracy.

The most time-consuming part before running load tests is the data preparation phase. We ask for support from other teams, write scripts, and use the same data over and over again to prepare the data suitable for our tests. Each team puts in the same effort just to create load test data. Additionally, many individuals try to run tests with just one or a limited data set. As a result, they receive cached responses and cannot obtain accurate and realistic results from the tests. We realized that enabling everyone to create test data effortlessly could save us time. So, we analyzed this need with Burhan Günaydın and decided to develop Arista as a product.

Figure: Arista logo

What is Arista?

Arista is a tool that provides data for load testing. It allows technical and non-technical people to run tests on Ares using the provided data.

Ares, developed by Trendyol Platform Test Team, is a custom load test platform utilizing the core framework from the same team and featuring a user-friendly UI for conducting load, stress, or performance tests. So we are smoothly running load tests via Ares screens, and all users are getting used to it quickly.

While designing the product, we had two main concerns: firstly, eliminating the time lost in generating test data, and secondly, ensuring ease of use for everyone. That’s why we developed Arista. Arista serves as an API that generates data according to the user’s request. Through this API, the person conducting the test is abstracted from the data preparation step. So, they can create the load test data they want without knowing any technical details by giving simple inputs. For instance, products that are out of stock, items that are only available in Turkiye, and products that cost over 1000 TL. In Trendyol, we store queryable data in Elasticsearch. Therefore, we build an Elasticsearch query with these inputs and then create load test data from the query.

Once we figured out how to create the data, our next challenge was to present a large amount of it. While 100 or 1000 pieces of content data could be returned as JSON from a REST API, the question was how to provide millions of test data to the user. To address this issue, we joined forces with Ares. At Trendyol, CSV files are commonly used for load test data. Mapping data with comma-separated values in API requests is quite straightforward. For instance, the first column represents the ID field, the second column represents the sellerID, and so on. We decided to upload the CSV file created on the server side to the user’s storage using Ares APIs. This way, we successfully delivered millions of data to the user.

Here is the appearance of Arista on the Ares.

Figure: Arista UI developed by Ares Team

We build our Elasticsearch query with the request received by Arista. We implemented it using the Builder Pattern.

func CreateQuery(d *request.Request, query Query, partitionKey string, gte, lt int64) Query {
return newBuilder(query).
Size(d.Size).
Source(d.CsvColumns).
Partition(partitionKey, gte, lt).
Build()
}

Source

We use the ‘fields’ we receive from the request to save on Elastic Query response time. The sourceMap corresponds to the mapping in Elasticsearch with the input provided by the user. This way, users can query the fields they need most in tests without knowing any Elastic mapping.

func (q *Query) Source(inputColumns []string) *Query {
var sources []string

for _, c := range inputColumns {
if d, ok := sourceMap[c]; ok {
sources = append(sources, d)
}
}

q.Query[SOURCE] = sources
return q
}

Partition

We have come to the most fun part. At Trendyol, many teams implement and use the customPartition structure instead of solutions like Scroll API, search_after, PIT. So, what is this structure, and what does it bring to us?

Problem: Let’s explain the structure with an example. Suppose we have 700 million indexed documents. We create a query that hits 50 million documents. We start fetching the documents with the Scroll API. However, the scroll may expire before we finish. To solve this, we can increase the scroll duration. But then, the number of long-term scrolls will start to increase. Currently, we can open a maximum of 500 scrolls. You can check the documentation for details.

Solution: Approximately, we determine a hash value based on our document count, say 50k. Each document has a unique ID. We take the modulo of uniqueID % 50000and index each document with the resulting value. We add range filter for the partition field in our queries, breaking down results with ranges to hit 10k documents each, like range gt:0-lte:50, gt:50-lte:150 etc. This way, we can parallelize our requests and overcome some problems where Elastic’s internal features fall short (If you interest this solution, you can check it out another Trendyol article). By the way, for parallelizing requests, you can check out the go-future library developed under Trendyol’s open-source projects.

Arista creates a CSV file as a response and upload it to Ares. The reason for determining such a strict output is that almost all of our teams use load test data this way. The fields in the created CSV file are matched and used in the test suite.

Conclusion

Arista is being developed to eliminate lost time while creating load test data. Now, we can create load test data with just a few clicks. This has allowed us to increase the frequency of our load tests. This year, Arista was used extensively to create load test data at Trendyol. It’s most significant usage was in load tests prior to the November 2023 discount days. These tests are the most extensive end-to-end tests throughout Trendyol. It continues to be used in Monthly and Overall Load Tests. Furthermore, we continue improving Arista based on the feedback and needs.

Figure: Arista Roadmap

Co-Author: Burhan Günaydın

About Us

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

--

--