1.5M Throughput with Golang at Trendyol

Hüseyin Güner
Trendyol Tech
Published in
9 min readOct 7, 2019

While Trendyol is taking firm steps to become Turkey’s biggest shopping platform, the technical burden placed by this growth has been increasing as well. Whereas we scaled our applications to 500K rpm (request per minute) last year, this load has become the usual load of the day. Now, our new target for this year is 1.5–2M rpm. Maybe, 5M for the next year, 10M rpm for the following year…

While the 800K request was removed with a spring boot application in the previous years, the spring boot has already started to be a clumsy structure for us with these new figures. Autoscale problems and high resource consumption have shown us that it would not be right to continue with a spring boot application. With this, a need for a technology that could run with low resource consumption under high throughput and get up too fast has arisen.

Trendyol Tech — Zeus Team

Before proceeding with the article, I would like to tell you about the Zeus team. Zeus is a BFF (Backend for Frontend) application that serves Trendyol mobile applications. That’s to say, it is an API that collects all data in Trendyol mobile applications from other services within Trendyol and then serves them to applications. Considering that 70% of the Trendyol users come over mobile applications, the criticality of the application increases substantially. In the face of such a situation, this API has various responsibilities as follows: it should respond very fast, handle timeouts well and stand uninterruptedly, the services should not affect each other, and the errors should be logged and action should be taken in the best manner… In this case, changing a technology and getting it live have critical importance.

In recent years, Zeus was offering service as a single API. However, Zeus has started to become a monolith application within a microservice architecture as the day went on and the load increased, and in the meanwhile, we decided to divide Zeus into pieces specific to the ones that take on the load. Along with this decision, it was critically important for us that each piece could run under high throughput and get up fast. To handle this situation, we entered into a process of POC. Following our searches, we decided to try out Micronaut, VertX, and Golang we came across in this process. Golang came out victorious from this triple battle. Golang has been preferred by us either since it runs with executable binary without a VM or due to its success in concurrent operations, low resource consumption, and community size.

First Acquittance with Golang

As Trendyol, now we started to take our first steps towards becoming a gopher. Although the transition from Java to Golang was a little painful for us, we started to learn to think simple step by step as a gopher, thanks to Golang. When you search for a language, you will see that it is a little bit different as of syntax. The syntax is a bit more like Swift, which is the programming language of Apple.

Our first stop to learn Golang was A Tour of Go. This source was prepared by Golang, and it offers you good examples and uses for all keywords and trigs related to Golang and has a Turkish source as well 🤗

[GET] /hello/{planet}

After reaching a level that can be regarded as a gopher now, we decided to create our first app. When implementing this decision, we chose the endpoints receiving fewer requests on Trendyol compared to other endpoints and compromising product reviews, favorites, and collections. We grouped them under an API that we called on ZeusSocial and started developing it with Go.

In our first app at Go, we used GinGonic as the web framework, net/HTTP being the native client of Go as the HTTP client, Logrus for log management, and Viper for configuration management.

Ready for Test

We finished our first app with Go and then, we were ready to test it! The test we started with a bang did not make us happy a lot: overshot at first bullet! As the load increased, we observed an increase in memory in the app and correspondingly, an increase in CPU in the load test. Although it was not a very heartwarming situation for us, we did not give up. Adopting the saying ‘’We live with data’’, which is a part of Trendyol culture, we analyzed the data we had and discovered that incoming requests were waiting at the network, and only 100 of those were being processed within the app with a buffer logic. Then, we started to edit the HTTP client configurations, and thereby, we came across this article. It was very informative content for us. We started the test again with a smooth configuration, and thus, the result is as follows:

As you can see in the charts, we managed to reach 1.2M rpm. Under this load, there was a memory allocation of 87Mb maximum per pod. After going stable for a while, the number of routines (you can consider it as a thread in Golang) increased when we started to receive a late response from an API. The reason was that the routines were blocked profoundly, and more routines got up to respond to requests, giving rise to a bottleneck.

The test results were fairly successful for us. Now, we had only one problem to solve: Timeout Management!

Hystrix Circuit Breaker

The solution to our new problem is Circuit Breaker, with which we were already mingled. We did not add a circuit breaker to our social app during testing to see the behavior of the app.

Our new mission was to add a circuit breaker and take the law into our hands! Following our searches, we met with afex/hystrix-go. We implemented it. Then, we were able to control the circuits according to the number of errors, timeouts, and concurrent requests in our app. Our app was now cleaned of long traces and ready to go live!

Done

We launched our social app step by step, and its immediate actual data is as follows:

Retrospective

The next thing to do is a retro version of Golang. Then, let’s have Mad/Sad/Glad

MAD

  • Language does not support OOP
  • There are no structures in current programming languages ​​with which we are familiar; Enum, Generic Types, Exception, Reference Types, etc.
  • The syntax is quite different, e.g., null = nil, so it was difficult to adapt it

SAD

  • We could not use the pointers cautiously; the results were bad and it took a long time to identify the problems
  • We could not get the high throughput we expected from the net/HTTP library and efficiency at concurrent user
  • We could not define the routing what we wanted in GinGonic, and it does not support the restful truly
  • We were not very successful in error handling

GLAD

  • We met an averagely 80K rpm load with an average of 10 pods (0.5 Core Cpu, 0.5Gb Memory) and 55% — 30% use, respectively
  • Apps got up very fast, and hotfixes became able to get out much faster
  • It scaled very fast at a traffic spike and in case of downscale in traffic decrease, pods died sturdily, and we did not receive any errors

Second Meeting

We met, became familiar, and were satisfied with Golang, and thus, we decided to improve and carry out the existing, being an endpoint, which was taking on more load compared to our retro results. Thus, we decided to meet with Golang for the second time.

In our second app, we chose the product detail service bringing the product page information in Trendyol. Then, we rolled up our sleeves by benefitting from our experiences so that our second Golang app that we named ZeusProduct, became more stable and met more requests.

Web Framework and HTTP Client Change

In our previous Golang experience, GinGonic gave some upsetting results for us. We could not entirely pass the paths in the existing structure of the routing, and therefore, we had to extend the path a little. Moreover, it also upset us in error handling and request binding.

Net/HTTP being the native HTTP client of Go gave upsetting results in concurrent requests. Both failure to limit concurrent requests and the memory increase in concurrent requests led us to seek different solutions.

In that case, it was time to set sail for new horizons. The first port was the web framework!

labstack/echo

When we started our searches for a web framework for Golang once again, we came across a new framework. Echo! We decided to make a POC with echo as its transition from GinGonic was easy, it was more efficient, it resolved our current problems, and then, it was successful. This difference also shows up in the benchmark tests they share:

valyala/fasthttp

While we just met with echo in Golang, we came across an HTTP framework for which we said, ‘’We wonder if we can find it?’’. Fasthttp! We thought it was worthy of its name seemingly and according to benchmark tests.

net/http client:

$ GOMAXPROCS=1 go test -bench='HTTPClient(Do|GetEndToEnd)' -benchmem -benchtime=10s
BenchmarkNetHTTPClientDoFastServer 1000000 12567 ns/op 2616 B/op 35 allocs/op
BenchmarkNetHTTPClientGetEndToEnd1TCP 200000 67030 ns/op 5028 B/op 56 allocs/op
BenchmarkNetHTTPClientGetEndToEnd10TCP 300000 51098 ns/op 5031 B/op 56 allocs/op
BenchmarkNetHTTPClientGetEndToEnd100TCP 300000 45096 ns/op 5026 B/op 55 allocs/op
BenchmarkNetHTTPClientGetEndToEnd1Inmemory 500000 24779 ns/op 5035 B/op 57 allocs/op
BenchmarkNetHTTPClientGetEndToEnd10Inmemory 1000000 26425 ns/op 5035 B/op 57 allocs/op
BenchmarkNetHTTPClientGetEndToEnd100Inmemory 500000 28515 ns/op 5045 B/op 57 allocs/op
BenchmarkNetHTTPClientGetEndToEnd1000Inmemory 500000 39511 ns/op 5096 B/op 56 allocs/op

fasthttp client:

$ GOMAXPROCS=1 go test -bench='kClient(Do|GetEndToEnd)' -benchmem -benchtime=10s
BenchmarkClientDoFastServer 20000000 865 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd1TCP 1000000 18711 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd10TCP 1000000 14664 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd100TCP 1000000 14043 ns/op 1 B/op 0 allocs/op
BenchmarkClientGetEndToEnd1Inmemory 5000000 3965 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd10Inmemory 3000000 4060 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd100Inmemory 5000000 3396 ns/op 0 B/op 0 allocs/op
BenchmarkClientGetEndToEnd1000Inmemory 5000000 3306 ns/op 2 B/op 0 allocs/op

We decided to take a chance on Fasthttp and did a POC, and the results were noticeably good. We wondered and set out to investigate how they did this.

As you also see in the benchmark tests, memory allocation is 0 in fasthttp! In fact, what underlies this is the sync package that Golang offers us. Together with the pool package under the sync package, Golang has provided us with a package reducing the memory allocation for objects, which are used repeatedly, and thereby, the cost of garbage collection. Thanks to this package, we draw the objects we need from the pool and then set them back when we are done with them. You can find a simple and good article about it here.

We developed ZeusProduct with the fasthttp utilizing the benefactors of the sync package and also making a few more tunings, and with the echo that solved our problems and thus, the test results made us happy. Here are the results:

Go Trendyol Go

‘’Go Trendyol Go’’, which is now our motto, has become much more meaningful for us. With Golang, we can now scale to 1.5–2M rpm very easefully. Moreover, the fact that it makes these only with 7.5% of the resource used by our current spring boot application (under the same load) has enshrined it in our hearts.

--

--