Is your parallel HTTP client performant?

4 min readMar 25, 2019

Recently I came across the need to write a parallel HTTP client. The client should be able to send out multiple requests parallelly and then collect the response of all the requests into one.

Writing such a client is not a big task when you have Akka-HTTP at your disposal. You can very easily make use of the construct provided by Scala to make your collection processing parallel and then send out a single request using Akka-HTTP provided singleRequest API. For your reference, you can find out a working example of such a client here.

Now consider you are sending out four requests in parallel where each takes around 250ms. What is your expectation from the client? How much time it should take to give you the aggregated response? Well, I was expecting the response time of around 280ms. i.e. max time among all the requests + overhead in collecting responses. Is it the case though?

I was surprised to see that actual time for 32 of my parallel requests each finishing in 250ms at server side is around 2200ms. That was quite surprising for me. It was good that I knew my default client-side pool configurations. So I went ahead digging into HTTP 1.1 spec to clear my suspicions.

Here comes the real learning, according to HTTP 1.1 spec as you all must be already aware of we can have persistent connections. What a persistent connection is, instead of creating a new heavy TCP connection for each request, you actually send out multiple requests over the same connection instead of closing the connection immediately. This helps in saving TCP connection setup and tear-down time. Akka-HTTP default config says that the connection host pool will open 4 connections per host and you can have a maximum of 32 requests open at any given time. That said, you put 8 requests per TCP connection. Do you smell anything here? Well if you have, let me tell you about one more interesting spec of HTTP 1.1.

This spec is called pipelining. As its name gives a hint, you can send out multiple requests over the same connection without waiting for the response of request sent earlier. More interestingly it manages to send the response back in the same order of request. Unfortunately, pipelining support is not there in Akka-HTTP client.

Do you see the problem now? When we are expecting that all of our requests are getting submitted to the server at the same time, it’s not the case. A queue gets created as each request per connections waits for previous request to respond back and after that only proceeds.

What is the solution we are looking for then?

Well in my case one option is, change the default config to open 32 connections where each connection will get a single request. Do you think after this change you will get ~280ms time? Well no. Now this time we are adding TCP connection setup time(~10ms?). So for 32 connections we are adding 32 * 10 = ~320 ms. One good thing is, Akka-HTTP host connection pool has defined a lifetime of a connection and also the host itself. The default value is 1 minute, i.e if you are hitting next parallel call in less than 1 minute of the first, you do not encounter cost of connection setup and you get the expected time i.e, around 280ms.

The question here is, is it fair for the server to open 32 requests? In case of browsers opening more than 10 connections is considered as rude. The browsers default value varies between 2 to 6. In my case, as the interaction is service to service, opening 32 requests could be a safe option. But one should take such decision only after knowing the design of the system carefully. Knowing how many instances of both systems are there and the behavior of middleware(NGINX, HA-proxy, etc) should help to do the math. You also need to keep in mind that there could be other consumers of the server as well.

Knowing a little about your underlying client, i.e. default configuration and behavior and network stack helps a lot in such tweakings. I should say it will help you set some realistic expectations from the performance perspective. Even if there is support for pipelining in your client you can boost some performance if your requests are heavy on the wire, i.e more data transfer.

Learned something new? Great. You guys are awesome if you made it till down here. You look genuinely interested so you should do some research about what HTTP 2.0 has to offer us like persistent connections and pipelining and how they do it.

Is your parallel HTTP client performant?

What is the solution we are looking for then?

Written by Swapnil Khandekar