How We Prepare 2M Sellable Product Excel Report In Seconds With Scalable And Fault Tolerant System — Part 2

Published in

Trendyol Tech

3 min readJun 27, 2022

This is the second part of a three-part article aimed at describing how we designed and developed a scalable and resilient reporting service. If you are haven’t gone through the first part, you can find it here.

By this part we are going to focus on how we are creating reports with uploading segments and a large object to OpenStack’s Object Storage component.

In Reporting Consumer application, after consuming a report-create-request event from Kafka topic, we must make a request to Product Read API in order to retrieve products by applying given advance filter query. As you might imagine, the collected data can be tens of GBs. That makes almost impossible to retrieve and generate a file effectively with all of these contents in a POD at once.

That’s why we had to split our query based on pagination and named each page as a task. By producing our tasks to the multi partitions Kafka topic, now we can handle each task independently.

By default, you cannot upload an object is greater than 5 GB to Object Storage. However, it allows to divide it into number of segments while uploading. Then you can concatenate these pieces by defining either Static(SLO) or Dynamic Large Object kind. You can find more details about Large Objects from here. While concentrating on Large Object we find out that SLO also has a 1K limitation for total segment count. It can be increased but still limitation :). So we decided to move on with Dynamic Large Objects.

After examining several go libraries for interfacing with swift, we decided move on with ncw/swift. It seemed it facilitates to working with Large Objects. With this library, our first approach was to append retrieved data for each task to the our Dynamic Large Object. But we had deployed our application with multiple replica count, so it caused getting some exceptions from Object Storage because of concurrent modifications.

we addressed this issue by taking the segment creation job from the library. We uploaded each completed task’s content as a new usual object under the specific folder that named as task’s report id into our segment container.

segments of the report has 3bed4be2–5f90–4e3a-baac-1ad3de8200bf as id

Finally, after all tasks are completed, we uploaded our no-content(manifest) Dynamic Large Object with configuration that specifying our segments’ directory as follows:

loOpts := swift.LargeObjectOpts{
   Container:        container,
   ObjectName:       filename,
   SegmentContainer: segmentContainer,
   SegmentPrefix:    segmentPrefix, //report_id
}

fileWriter, err := c.conn.DynamicLargeObjectCreateFile(ctx, &loOpts)

Now when we try to download this manifest file, it automatically concatenates it’s all segments’ content and provides all contents in a file.

Just a minute, how can we ensure all tasks completed successfully in this system?

Keep reading you will find the answer and more in part 3

How We Prepare 2M Sellable Product Excel Report In Seconds With Scalable And Fault Tolerant System — Part 2

Written by Yasin Kızılkaya