Observations about Web Service Technologies: Using Custom Web Clients
Thundering Web Requests: Part 5
This is the fifth post in a series of posts exploring web services related technologies.
Having implemented custom web clients and evaluated web service technology using Apache Bench, I evaluated the same web service implementations using custom web clients. This post documents both server-side and client-side observations from this evaluation.
The setup from the previous AB experiment was used with the following changes.
Instead of Apache Bench, three custom web clients: HTTPoison+Elixir, Go, and Vertx-Kotlin, were used.
Each of these clients reported the time taken by each request. They also reported if a request succeeded or failed based on the error reported by the underlying web service technology. Unlike Apache Bench, they did not check for failures in terms of incomplete payload.
External to these clients, the total wall-clock time taken by all requests in an execution of a client was captured by executing the clients using the
time command on Linux.
Derived Requests per Second
The requests per second metric was calculated across all clients in an Ansible script execution using the total number of issued requests and the total wall-clock time reported by the
time command. So, this metric includes any warm up time required by the clients.
Since the clients were based on different language platforms and web technologies, they likely exhibited different performance and behavioural characteristics. I will explore this aspect in the next blog post.
The execution of this experiment was similar to the previous AB experiment barring one difference: in each Ansible script execution, on each client node, one of the three web clients were chosen at random and executed. So, while 963 Ansible script executions involved at least two different clients, seven executions involved only the Go web client and five executions involved only the Vertx-Kotlin web client.
Observations about Performance
For each network traffic configuration and concurrent requests configuration pair, the Ansible script execution with the highest number of requests per second was considered in making the following observations.
- As the number of concurrent requests increased, Actix-Rust and Go-Server implementations performed better than other web service implementations at every network traffic configuration. This behaviour was predominant as the network traffic increased.
- Compared to the results of the previous AB experiment, the absolute performance of Actix-Rust and Go-Server implementations was lower while the overall performance of the remaining implementations hardly changed.
- At lower concurrent requests, every service implementation exhibited fewer requests per second with custom web clients compared to with Apache Bench in the previous AB experiment.
A likely reason for observations 2 and 3 could be the behavioural differences between the custom web clients.
None of the web services could service 10K requests per second on a Raspberry Pi 3B
The server-side observations in this experiment were similar to the server-side observations from the previous AB experiment. The only difference was, at every network traffic configuration, at lower concurrent requests configurations, most web service implementations did better than in the previous AB experiment .
Actix-Rust and Go implementations consistently performed better than other implementations
Observations about Failure/Reliability
As in the previous AB experiment, the client-side performance of each web service implementation improved as the number of concurrent requests increased. To understand this, I again examined failures during execution.
Based on Failed Clients
Unlike in the previous AB experiment, every execution of custom clients completed without crashes, i.e., there were no failing clients.
Based on Raw Least Number of Failed Requests
The below table lists the least number of failed requests across all five Ansible script executions against a service implementation in a configuration. (Execution) Instances with no failed requests are not shown, i.e., empty cells and absent columns. Significant instances where the number of failures were more than 5% of the maximum number of requests are show hilighted in red.
From the above table, we observe
- Six service implementations failed to serve some requests in all executions in certain configurations. Same as in the previous experiment.
- There were 35 failing instances in this experiment. This number was comparable to that in the previous AB experiment, i.e., 35 vs 37.
- The number of significant instances in which the number of failures was more than 5% of the maximum number of requests was twice the number in the previous experiment, i.e., 24 vs 13. And, most of these instances involved Flask+uWSGI-Python3 implementation.
Types of Errors (Failures)
In the execution instances with failed requests, there were 5 types of errors.
- checkout_timeout (A) error when a client encounters a timeout while checkout out a socket to initiate a connection. This error is not a server-side issue.
- connect_timeout (B) error when a client encounters a timeout when connecting to the service. This error could be a server-side issue.
- timeout (C) error when the client encounters a timeout in getting a response. This error could be a server-side issue.
- closed (D) error when the service closed the connection before providing a complete response. This error could be a server-side issue.
- unknown reason (E) error when a client does not log the the error. These errors could be any of the above errors or something different. This error could be a server-side issue.
Of the three different web clients used, only Vertx-Kotlin and Go clients failed to report the reason for failed requests (unknown reason error). The number of such failed requests due to this error is highlighted in green in the table. The remaining types of errors are reported only by HTTPoison-Elixir client.
Based on Corrected Least Number of Failed Requests
Based on the error types, instances involving failures only from checkout_timeout errors need to be eliminated. So, with this elimination/correction, the above table changes as follows.
In the above corrected best-case table,
- The number of significant instances not involving Flask+uWSGI-Python3 implementation reduced from 10 to 5 while the number of significant instances involving Flask+uWSGI-Python3 implementation remained unchanged.
- Micronaut-Kotlin, Ratpack-Kotlin, Tornado-Python3, and Cowboy-Erlang implementations failed in fewer configurations.
- Poor performance of Flash+uWSGI-Python3 remained unchanged.
Based on Corrected Most Number of Failed Requests (Worst-case)
I also considered the most number of failed requests (worst-case) data after correcting it for checkout_timeout errors.
In the above corrected worst-case table,
- Compared to the previous AB experiment, the number of failing instances decreased, i.e., 50 vs 88.
- 19 failing instances involved only HTTPoison-Elixir client; all red entries in the table. Most of the failures in these 19 failing instances were due to checkout_timeout errors in HTTPoison-Elixir client. So, the number of failed requests due to service implementations is much lower than the reported number.
- In the 16 failing instances involving more than one kind of web client and not involving Flask+uWSGI-Python3 implementation, ~25,000 failures occurred due to checkout_timeout error in HTTPoison-Elixir client. So, the actual number of failed requests due to service implementations is much lower than the reported number.
- In the 15 failing instances involving Flask+uWSGI-Python3 implementation, ~12,000 failures occurred due to closed errors and ~18,000 failures occurred due to unknown reason errors. All of these could be due to server-side errors. So, in comparison with the results from the previous experiment, Flask+uWSGI-Python3 implementation seemed to fare worse when interacting with non-homogeneous group of clients (possibly due to some combination of the characteristics of the clients).
- If the failures encountered while using HTTPoison-Elixir client were ignored, then Vertx-Kotlin implementation was also failure-free in the worst case!!
- Unlike Vertx-Kotlin, Ktor-Kotlin implementation was involved in four failures with Go client; hence, it was not as reliable as it was in the previous AB experiment.
Failure and Performance: Overall, compared to the previous AB experiment, fewer failed requests were observed in this experiment. This could be attributed to the overall reduced performance of the web services which in turn could be attributed to the performance of web clients in isolation or in combination; something to consider/explore.
Failures with Ktor-Kotlin Service Implementation: Of the four Go clients and one HTTPoison-Elixir client involved in the worst-case execution of Ktor-Kotlin implementation, one of the Go clients timed out on four requests after 30 seconds, which is the default timeout for HTTP connections in Go. While 540 second timeout was used in the previous AB experiment, the default timeout provided by the underlying web client technologies were used in this experiment. The default timeout is a likely reason for the failures encountered by Ktor-Kotlin implementation.
Failures with HTTPoison-Elixir Web Client: Hackney library is the basis of HTTPoison library. The default timeout to check out a socket from a pool of sockets and to make connection with a service is 8 seconds each. Also, the default timeout for receiving data over a connection is 5 seconds. All of these in total is much less than the 540 second timeout used in the previous AB experiment. So, again, the default timeouts are a likely reason for the failures encountered with HTTPoison-Elixir web client.
Unlike the previous AB experiment, this experiment was a bit flawed due to the reliance on default configuration setting, and this resulted in data that needed some correction. So, one big takeaway is
While using a feature of a library, understand how various options and their (default) values influence the behaviours of the feature
Despite the flawed experimentation, all but one of the observations about web service implementations from the previous AB experiment held true in this experiment.
- None of the web service technologies could serve 10K requests per second on a Raspberry Pi 3B.
- Actix-Rust and Go implementations were most performant.
In my next post, I will examine the three technologies used to implement the web clients used in this experiment.