A real-world comparison of web frameworks with a focus on NodeJS
This story gives an overview and compares the performance of current state of the art technologies for developing web services. The fourteen tested technologies and frameworks are tested against seven real-world test cases to emphasize the strengths and weaknesses of the technologies under test. The focus of those tests is explicitly set on real-world behavior using default configurations of the technologies. The analyzed technologies are: .NET Core, NodeJS, PHP, OpenResty, Java, Python, Go. For each of them at least one common framework is tested. After presenting the results of the tests in-depth along with explanations and remarks the story is finalized with a resume.
One day at work I was required to come up with a web service which can be used to check authentication of HTTP requests. While this is a very simple task, it comes with a certain performance implication: since every request is checked by this service, it needs to be blazingly fast in order to allow for fast request processing.
After work I revisited this topic, as I often do, and thought about it a little more. The conclusion was two fold: on one hand choosing NodeJS was a really great idea since it means that it can integrate perfectly with all other already existing services. On the other hand I noticed that there was no intuition in my mind to answer the question if this will result in a fast or slow result.
I then did a research on fast technologies for the given use-case and came across Nginx/OpenResty, which is — by the way — an interesting technology. After some tests I did some quick performance measurements by hand and compared them. Since I discovered a big gap, I decided to dive deeper into this topic. The result of this “diving trip” is basically this story and the executed performance analysis.
The results allowed me to support my decision and also gave me an intuition on the differences in performance and also usability of the shown technologies and frameworks.
Tested frameworks and technologies
Based on the initial intentions under which the idea for this testing has been created, many technologies and frameworks are NodeJS-based since this is currently my main “production” development language. But other frameworks and technologies, which I used in the past, have been added to give a broader view on the topic of web application development. The analyzed technologies during this tests are:
- .NET Core “Nancy” and ASP.NET Core — 2 test cases
- Go — 1 test case using Go Mux
- NodeJS — 6 test cases
- Nginx / OpenResty — 1 test case
- Nginx / PHP — 1 test case
- Python — 1 test case using Flask web framework
- Java Spring Boot and Java Tomcat— 2 test case
The main motivation of the tests is the comparison of the different technologies and the insights how they behave relative to each other. Although most of them are vastly different in how they work and which usability they provide to the development process, they are uniform in the fact that they can be used to build web applications. And this is the important point for this tests and comparisons.
Since I’m currently most into NodeJS it was logical to have a deep look into the frameworks which I use. The project, which introduced me to this test, is a NodeJS project, as described . Through this tests I got really important feedback which were not expected at the beginning. For the NodeJS tests, the following frameworks have been selected:
- Express or ExpressJS — One of the most popular web frameworks for NodeJS. This framework is the basis for many applications and bigger frameworks in NodeJS
- Fastify — A web framework which focuses on performance while maintaining some benefits when coding. This framework claims to be very fast which was one goal of this tests to evaluate
- NestJS and NestJS based on Fastify — For bigger projects I personally use the NestJS framework since it is the super-duper-happy-path. It supports many different use-cases and provides many good ways to structure and organize code. Those aspects are especially important for bigger projects
- NodeJS Plain HTTP server — The
http.createServer()function is as bare-bone as one can get on NodeJS, when it comes to HTTP serving. Therefore, I expect the highest performance of this built-in “framework”
- NodeJS Ts.ED — This is a direct competitor to NestJS and I use it in some private projects since it provides a very simple framework with basic tools without the need of a huge pile of dependencies or files while maintaining a typescript source with service-controller structure
Besides this extensive tests on the NodeJS technology, the other technologies are characterized by this tests with a few very common frameworks like Java Spring Boot, PHP, ASP.NET Core, Go, Nginx/OpenResty and Python.
For the rest of the story the term “technology” is used as a short for technology and framework to distinguish between the different test cases conducted.
Design of the test cases
The main goal of the performance tests is to get an understanding about how certain technologies and frameworks behave for different use-cases. This might then later help to choose the appropriate one for a given use-case. For this the real-world behavior of the frameworks and technologies is much more important than creating a carefully arranged “lab” experiment. The main goal of the tests is to observe and study the behavior of the tools in the “free wild”. So the different test setups are not tuned in a particular way, but the default configurations are applied whenever possible.Therefore, the test cases have been designed to try to give a meaningful result to answer this question.
In order to get a decent image of the behaviour seven test cases have been designed to cover certain aspects or requirements of web application. They are described in the following:
As described in my use-case initially I need to validate and generate JWT with a high throughput. This test case provides an API endpoint which generates a dynamic JWT, for example with this payload:
The generated JWT is dynamic, because the generation time is (ideally) different on every request due to the changing content. This makes caching at some layer unlikely.
One contra point against the tests could be, that the libraries differ from technology to technology and therefore the results are not compatible. This applies to a “lab” experiment but in the “free wild” developers pick a library by e.g. preference. Based on this assumption the most popular JWT frameworks are used and technologies are compared. Therefore, the tests might not be a 100% comparable from a technical point of view. But they are very close to the real-world and are comparable from a use-case point of view, because both generate a JWT token.
For this tests it is expected to give results on how good the technology behaves on computation load which is common for web applications involving also a grain of cryptography.
Index document fetch
The endpoint provided by this test case responds with a 72 byte HTML document which simply states the classic Apache’s “It works!”. This data is held in memory and not served from file. This obviously does not apply to technologies like PHP which are not based on an application server process where the code alive for the lifetime of the process. In this cases the web server provides the file. But the results are comparable based on the real-world assumption since this is the way to do it in this technology.
The index document is explicitly held in memory to provide a kind of baseline how fast one can get in the technology, theoretically. Theoretically, because other aspects like optimized async behavior over synchronous behavior are not considered here. But keeping a value in memory and returning it should be very fast.
Delivering RFC2616 from file and via file read
The RFC2616 — by the way a very interesting document — is approximately 420 KiB in size and considered as a “big” file during this tests. The two endpoints provided by this test case serve the following purpose:
- One endpoint simply provides the file using the static serve functionality of the technology if available. This gives an indication on how the performance of the static service functionality behaves
- The other endpoint provides the file, read by the technology over the file read functionality of the technology. For NodeJS, for example, the following code is used:
It uses the asynchronous API of the NodeJS filesystem API to read a file and return it when ready. This is done this way to compare the results of the NodeJS technology against other technologies since NodeJS is single-threaded and claims this asynchronous tasks as it’s sweet spot.
This test cases have a dual purpose: they are used to get insights over the performance of static serving files versus reading them from a file. But they also serves as a comparison for all the other technologies against NodeJS since it does not have a native multi-threading model which most other technologies do.
Delivering RFC7523 from file and via file read
With this test cases it is the same as with the test cases for RFC2616: two endpoints provide either the file via static serving or using the technologies native file read functionality. The difference is in the size of the document: this document has only a size of 26 KiB and serves as a direct comparison to the other case. Since it is only 6% of the size of RFC2616 one can get the performance difference and the relation to the delivered file size.
With this test case the comparison to the bigger files should be conducted and a statement about the behavior based on file sizes could be made.
This are all seven test cases which cover usual use-cases of web applications broadly: static serving, computational tasks and asynchronous operations for reading data from file or database.
Measured values and operation procedure
Since this story is all about performance and comparing the performance of the selected technologies the most important metric is the request count per second. For this measurement one wants the highest possible value since the more requests per seconds, the faster your application in general.
Besides this rather simple metric the RAM memory consumption of the compared technologies is important, too. The results should definitely be taken into consideration. Because not only the speed of your service is relevant but also the cost in form of resource requirements and financial cost is very important. Generally with rising cost, people are willing to accept performance degradation, at least until a certain level.
Since I’m not only a developer, but also heavily involved into DevOps I know that the resources of a server are limited and the less a service consumes the better. For cloud deployment this rule applies as well: if your service comes into the gigabyte range of memory consumption prices for virtual machines are rising. Even a Kubernetes cluster or something comparable will not help you either, because the resources are required to be provided. So knowing about the resource consumption of your service under certain loads is very important.
Another resource which is limited is the CPU. Although, it is not that critical in comparison to memory consumption in the form of RAM, it also is important especially if the service does some serious computation. This use-case is covered by the JWT computation test case. Be aware of the fact, that for number crunching services (video transcoding, …) an entire different test setup is needed. But for the rest of the services the I/O operations like database access and file access is much more important. This should then be covered by the file read test cases presented before.
Related to the memory consumption during runtime in the form of RAM also the memory used by the service’s docker container is compared. Despite the fact that we are nowadays in the beneficial situation that hard drive space does not matter so much from a space and financial perspective, striving for a small image is important since upload or transfer times to a docker repository or any other transfer does come into play if you have an infrastructure of many micro services. For sure you will come into situations where you need to debug certain parts fast and those sizes will (then) bother!
Finally the last value which is measured during the test is the latency. This value should be as low as possible since this is basically the wait time of the client for the response. It is related to the request per seconds since there is a throughput limit which will result in filled queues and waiting time but it should be kept low. The values give a rough indication on how optimized the technology or framework is for many concurrent requests.
The tool used to carry out the data acquisition is wrk:
modern HTTP benchmarking tool capable of generating significant load (https://github.com/wg/wrk)
For a given amount of time it keeps the provided number of connections open to the system under test and distributes this connection count over the provided number of threads. If requests finish, as many new requests are made to reach the required number of open connections. After the test period the request per second and latency measurement can be computed.
The consumption of memory or CPU is directly collected via the docker stat command, providing the values from the docker internal statistics. Since they are not a 100% accurate (especially the CPU usage), they rather point in a direction than give accurate values.
Execution environment and test setup
The tests have been executed on a Intel(R) Core(TM) i7–8700K CPU @ 3.70GHz with 6 cores and 12 threads, backed by 12 GB of main memory on a 64-bit Debian Linux 10 with Kernel 4.19.0–8-amd64. The used docker version is 19.03.8 (build afacb8b7f0).
As described above the entire “test” consists of seven test cases which are tested for every 14 technologies. And for each of those singular tests, it is repeated multiple times for the different connection counts at a duration of 30 or 45 seconds. The final results, shown in this article, are derived from the 45 second test duration. For every of those tests the docker container is restarted, so that a more or less “clean” environment is ensured and no test has any benefits from the previous test.
Additionally, for every of the 14 tested technologies, after building the Docker container API end-to-end tests are executed to check if the container works correctly and all seven API endpoints are reachable and return the correct result. Furthermore it is ensured with an end-to-end test that the started container is also the correct container for the technology to eliminate the possibility to confuse containers with each other.
These steps are executed sequentially by a script which also records the results from the wrk tool and the statistic output from docker stats. During a post-processing steps the numbers are loaded into a database and the charts are generated from these results.
Take your guess!
Before going into the results I’ld like that you challenge your mind: take a moment and make your own predictions about which the winning technology is named or your how favorite technology will behave. It is not only fun to challenge yourself, but will also challenge your understanding — I had fun coming up with a prediction and comparing it to the results.
This section is the main part of this story and shows in the following, the comparison of the five big criteria for this test: Requests/sec, latency, Memory, CPU and container size.
Result 1: Container size
The following table gives a direct overview over the sizes of the different images. While constructing the test cases, it was tried to build the container as small as possible without trying to invest a huge amount of time into shrinking the containers or applying “hacks”. Therefore the numbers can be considered as a good average:
It can be directly observed, that:
- The Go container is the smallest container with only ~7% of the size of the biggest container and a total size of 14 MiB
- The biggest containers are for .NET Core. Although the size can be reduced by more aggressive attempts to shrink them, here standard Microsoft containers are used
- The size of the Java containers is surprising, because they are around ~100 MB while using OpenJRE images. The official Oracle containers take much more space
- The PHP/Nginx container is the 2nd smallest container with ~85 MB although this seems to be much for having a PHP binary, Nginx and PHP FPM. The same applies for the Nginx/OpenResty container
- The NodeJS containers are sized around ~120 MB which seems to be a pretty good result, considering the node_modules “hell” — see this famous image on reddit
Finally it can be said, that the tested technologies do not significantly differ in the container size with exception to the Go container which is tiny in comparison to the other containers. They are all around an acceptable 120–200 MiB.
Result 2: CPU usage
The following diagram shows the CPU usage measured during the tests. Every technology has its own color as shown on the bottom legend. On the x-axis different connection counts are plotted. For example “600” means, that for the duration of the tests 600 connections have been kept open to the service at every point in time. This means, that the load rises from left to right with the lowest load on the left and the highest load on the right. The y-axis shows the CPU usage of the docker container in percent, acquired by running “docker stats”. As stated before, the hardware on which the tests have been carried out are equipped with six CPU cores, so that the maximum is 600% which can be taken by any software: all cores execute the application at 100%, which is practically not possible since other tasks have to be executed as well.
The results show very clearly that NodeJS is a single-thread technology, since it only takes up roughly 100% CPU or one core. Due to the fact that it has no multi-threading capabilities by default it cannot be spread across multiple CPU cores. The results are very clear in this regard.
A little bit more of a question is the fact that Nginx/OpenResty and Python/Flask are also in this region although they are not per-se bound to this limitation. At least for OpenResty it is concluded that this can be configured since it is “basically” an Nginx with Lua support and the Nginx with PHP behaves much better in this respect. But the exact reason has to be analyzed further.
As it can be observed, and a little bit unexpected, the winner for this aspect of the comparison is Java SpringBoot, since it uses most of the resources. As we will see with further metrics it will reveal that it uses much more resources than delivers performance. It is directly followed by ASP.NET Core with an almost equal high consumption.
A little bit more disappointing is the dip in the results of PHP since it is — with the used FPM model — a perfect candidate to behave well for this test since its execution model is the exact opposite of the single-threaded NodeJS approach. Since the standard configuration has been taken for the PHP/FPM container there is room to optimize and tune the performance.
The other technologies and frameworks, like Go, .NET Core with Nancy or Tomcat plain, are in the midfield and do not attract attention but do utilize multiple CPU cores.
The following diagram shows for the test case of generating a dynamic JWT token the CPU usage per technology as well as the standard deviation of the results:
It is rather interesting to observe that NodeJS has a pretty small deviation. But this is not a big surprise since it has a defined maximum due to it’s single-threaded nature. It jumps straight up to 100% and has not much play then since it is limited on top and pushed up by the load.
The other measured technologies behave in a same way: if one looks at ASP.NET Core it as a pretty small deviation which makes also sense with the diagram from before in mind since the values were very consistent.
Only PHP and the two Java-based technologies have a huge deviation. This big variation in the measured values is not necessary a bad thing, because it only means that the CPU utilization was not stable. It could mean that the technologies make a very efficient use of the resources, often changing the utilization based on the load. Since the tests generated a fairly consistent load, the values should be more consistent. The reason for this big gap needs to be evaluated separately.
Result 3: Memory (RAM) usage
The next category which has been examined during the tests is the RAM memory usage of the different technologies. The following diagram shows the average RAM usage for the different connection counts for all test cases. This should give a rough estimate on how the different technologies behave from a general perspective:
As it can be observed there is a major “winner” in a negative sense: Java Spring Boot consumes up to 900MiB of memory during its runtime with a slight rising tendency for bigger connection counts. It is followed — with some distance — by the plain Java Tomcat server which has also a rising tendency for higher connection counts but uses “only” up to 500 MiB of RAM.
On the other side the Nginx OpenResty consumes a maximum of 5.2 MiB of RAM and therefore consumes only 0.6% of the memory which Spring Boot consumes. Since both technologies deliver roughly the same result the question arises why there is so much overhead. Besides this the trend of memory consumption is mostly constant regardless of the measured connection count. If one thinks about it, this behavior makes perfectly sense: Nginx is written in C and therefore highly optimized and has no overhead generated by mechanisms like GC or others.
Most of the NodeJS technologies behave in regard to the trend roughly the same: they do not rise or fall very much. They only have an offset in contrast to Nginx/OpenResty which brings them into the memory consumption region of 80–120 MiB. Considering the poor results on the CPU measurements, due to the single-threaded nature of NodeJS, this are actually really good results. But they make also sense: since they are single-threaded there is a virtual limit on how many memory can practically be allocated by the process. Surely it could allocate as many memory as available but the point is that practically, for the use-case, the allocated memory it is limited since processing a request uses only a limited amount of memory for the required data structures.
More interesting is the result of PHP which stays around 25–30 MiB or Go which rises from 20 MiB till 105 MiB over the course of rising connection counts. At least for PHP the argumentation of the practical limitation of the used memory by the nature of the use-case can be applied. Furthermore PHP scripts “die” after execution and memory is freed. This makes up for a very efficient memory usage and reduces risks of memory leaks. Not saying that memory leaks in PHP can’t be a (big) problem. (😄)
ASP.NET Core rises with a mostly constant slope of 50% and ends up in the same size region like Java Tomcat (500-ish MiB).
The following diagram shows the maximum memory and the minimum memory used to deliver the static string from memory. This serves as a control test. If you think about it, it should not consume that much CPU or memory either since it just returns data from a memory region which should be blazingly fast.
As it can be observed in the diagram Java Spring Boot has a huge memory peak, which is way above all other memory peaks. Even the minimum is not near the maximum of all others. Obviously this a logical consequence from the average diagram shown before, but it stresses the point that this specific technology needs a generous amount of memory to operate and perform this “basic” tasks.
All other technologies behave normally and work in the region of up to ~200 MiB. The plain Java Tomcat server is also a big improvement since it only uses a quarter of the memory required by its big “brother” Spring Boot. But the current graphic is a bit misleading since on all other test cases Java Tomcat plain server is also in the same region. Have a look at this diagram which shows the maximum and minimum memory consumption for the test case of generating a JWT token:
Here it can be seen, that both technologies are on the same level when it comes to memory consumption — at least for rising connection counts. All other technologies are in the same region of up to 200 MiB of main memory consumption which is acceptable for a small to mid-sized application.
The Nginx/OpenResty server with up to 5 MiB of memory consumption stays the definitive winner in this comparison so far.
The following diagram shows the memory behavior for the test case of reading the bigger file from disk:
As you can see for this test the Java technologies do not have such a dramatic performance issue, they are behaving quite well. This comes from the fact that they are implemented the way so that they stream the data to the response from the file and they do not need to load them into memory first. As a best guess it seems to be the problem for ASP.NET Core since the memory consumption grows with the number of requests. This makes totally sense, because the file is loaded for multiple requests into memory and kept there for a little while. Therefore the consumed memory grows. If one wants to dig deeper into this topic one could analyze the performance of reading files in .NET Core. For this tests the function
System.IO.File.ReadAllText() has been used which is pretty default-ish. Furthermore .NET Core Nancy does not have this issue but uses the same file read mechanism.
Besides this implementation differences one can take away from this memory tests, that main memory (RAM) is an important resource and it needs to be taken care while developing applications to minimize the memory consumption, because optimizing such services later on is a huge effort and why not be aware of this beforehand and consider it.
Result 4: Latency
Before we come to the final measured value of the throughput in the form of requests per second, we will have a look at the latency in this section. It denotes basically the time it takes on the communication part and how much overhead this generates.
The following diagram shows the latency in seconds for growing connection counts:
This diagram behaves exactly as the intuition predicts: the more connections are open the higher the latency is, because more connections means more requests to process, or work to do, and since the amount of work which can be done is fixed, by the utilization of the CPU cores. Therefore it results in wait times. This also connects nicely to the CPU diagrams from before: this diagram is kind of an inverse view meaning that for low CPU core utilization the latency is highest and vice versa.
It can be observed that the NodeJS technology and frameworks are at high latencies, because they can only utilize basically a single CPU core. Therefore the requests have to wait until there is enough more computation time to process them.
And also this results will already give a hint on how the throughput will behave: since Go is the framework with the lowest latency the number of requests per seconds will be the highest. On the other hand NodeJS Ts.ED should end up as the framework with the lowest number of requests per second since it latency is the highest. We can verify this preliminary conclusions in the next section.
The following diagram shows the test case of fetching a big document via the technologies’s native file read mechanism:
This diagram shows nicely the performance degradation and rise of latency with higher connection counts. Especially for 800 parallel connections it can be observed, that the latency doubles in comparison to the previous connection count which indicates that it roughly rises linear with the number of connections. But besides this the latency is not much different from the average values shown above.
The last diagram for latency measurement shows the test case of delivering a non existing page, i.e. providing the “404 Not Found” page. The results are not that different from the previously shown ones but they highlight how “efficient” those error routines are implemented on the different technologies:
It can clearly be seen that most of the technologies have a very “efficient” implementation for non existing content. If the connection count grows the “problematic” technologies are exaggerated: especially NodeJS and the NestJS framework based on Express have a very high latency when it comes to error pages. This might be related to the fact that one can customize exception handling within these frameworks and the response which is sent with nice decorators. This usability advantage comes with a certain cost, expressed here as a higher latency. The NodeJS frameworks are by nature slower when it comes to error handling since they have to deal with the handling and overhead of the exception mechanism on a higher language level. For example for PHP this is not big of a problem since the Nginx server handles this cases and one can assume that this is definitely performance optimized. Interesting is the fact that Nginx/OpenResty is slower since it relies on the same Nginx server. So here might be some special code involved. But for this an detailed evaluation has to be performed.
Result 5: Throughput in the form of requests per seconds
Finally we arrived at the most interesting part which is the information everyone waited for up to this point: how many requests can be processed by which technology and which one beats them all.
The following graphic shows the number of requests per seconds for different connection counts over a period of 45 seconds test duration and averaged over all test cases (equally weighted):
The two technologies which fight for the first place are ASP.NET Core and Go. Both reach a throughput of processing up to 65k requests per second. If one takes in the CPU usage and memory consumption Go definitely is the technology which provides the best results: it uses less RAM memory and utilizes the CPU. ASP.NET Core has bigger requirements for memory and utilizes the CPU more but also provides roughly the same magnitude of throughput. Therefore, it can be said that Go is more efficient in using the resources and providing the best performance. But this comes without any wonder since it basically built on top of C language and has only a small overhead by the language and its mechanisms itself.
The third in the row is PHP. This is also no wonder since PHP is a really tried and tested technology which powers a huge part of the web infrastructure nowadays and although it might seem “old-school” it is definitely an important technology. Together with Nginx/OpenResty both use the approach of running request-based, meaning that the resources are freed after a request. Furthermore, because of the FPM model parallelization and execution of many parallel requests is very easy and no shared memory between those processes is needed usually. Over the course of growing parallel connections PHP behaves roughly constant and does not variate much.
PHP is followed directly by the plain Java Tomcat framework. Despite the fact that Java Tomcat was pointed out in earlier results to use many resources, it delivers a good throughput. Especially when it is compared to Java Spring Boot. Spring Boot is more on the lower end of the spectrum but consumes for this results much resources as shown previously. The throughput of the Tomcat test rises while increasing the number of parallel connections. This is mostly related to the JIT optimization feature of the JavaVM which boosts the performance by a good amount, delivering 1.6 times the performance at its peak.
The NodeJS technology stays near together in the lower region with up to 17,000 requests per second. It can be seen that the plain NodeJS server, implemented directly by using the
http.createServer() function, sets the benchmark for the highest achievable throughput by NodeJS. This conclusion makes also perfectly sense in combination with the intuition: all frameworks in NodeJS (Express, Fastify, NestJS, Ts.ED, …) are built on-top of the native functionality and therefore bound by the performance of this functionality.
Fastify and NestJS Fastify are very close to each other, they only differ by the overhead which is added by the NestJS framework. Almost the same story applies to Express and NestJS based on Express: they are shifted down due to more overhead by the Express Framework but they are roughly the same and also the overhead of the NestJS framework is the same.
The lowest throughputs were achieved by Python, the Ts.ED NodeJS framework and .NET Core Nancy embedded web server. They provided an even slower throughput but consume relatively high amounts of CPU and memory and therefore provide a bad cost-benefit ratio.
One of the strength of NodeJS, against other technologies, is the ability of working asynchronously where other technologies provide a way to use multi-threading or happen totally synchronously (PHP script). In order to get a statement for this the following diagram shows the results for the test case of reading the large file using the technology’s native means. In case of NodeJS this is a function callback provided by the native filesystem module. The numbers are much smaller for this case since it definitely consumes resources to send over this file so often. The following diagram shows for selected parallel connection counts the reached throughput:
Obviously the winner in this comparison is once again Go since it is at least at double the performance then the fastest result of all the other technologies. But if one looks at the third place, a NodeJS framework reached such high numbers: NestJS using the Fastify web framework. But there is a catch: since all other NodeJS based technologies are much lower in performance (less than half) and this measurement also exceeds the plain NodeJS server which was previously presented as an upper benchmark, it seems that there might be some other mechanism, like caching, which bumps the performance. Furthermore if one looks at the values of plain Fastify it can be noticed that even those values are much smaller and it is expected that Fastify plain is at least a bit faster than Fastify with NestJS framework. This should be analyzed further.
Besides this both Java technologies and Nginx/OpenResty as well as PHP provide solid results with high throughput. Especially the Java technologies provide a stable throughput but use also many resources for that as we saw earlier.
If now the size of the file which is read and send is reduced, the following behavior can be observed:
The values mostly remain in the same relations with the exception that the NodeJS technologies are much more performant and overcome for example PHP. This could indicate that the NodeJS applications are much more efficient for smaller payloads. The test case is then some kind of sweet-spot which emphasizes this fact. The absolute numbers are roughly scaled by four and in a much higher region in comparison to the big file.
Finally the following diagram shows the behavior of the technologies for a computation intensive task of generating a JWT token:
Since computation is involved in this test, the results are vastly different. Besides the very performant Go lang, ASP.NET Core enters the competition and reaches the second place. Both technologies are rather close to each other and maintain a big distance to the other frameworks.
Nginx/OpenResty as well as the Java-based technologies are also very fast in computing those JWT tokens.
They are followed directly by the NodeJS frameworks where the plain NodeJS server sets the top bound and most of them reach this bound more or less. The Ts.ED framework, which is an alternative to NestJS, has a very poor performance. And also on the previous tests the performance has not been much better for the different test cases. It seems that processing the requests is rather slow and has a lot of processing logic in comparison to other NodeJS-based frameworks. This also confirms the preliminary conclusion which was made in the latency comparison.
The end of the range is closed by PHP and Python. Interesting is the fact that PHP has a very poor performance, because it used to behave pretty good until this point. This might come from the fact that the JWT computation library used might not be very performant.
This were the highlights of the conducted performance measurements. Keep on reading to get to all of the results and the raw data.
Reproducibility and more results
This article obviously shows for space reasons only the highlights of the test results. More diagrams, the original data and all the test cases and sources can be found in the Git repository:
- Find all resources, tests, codes and example charts in the root of the main repo
- Here you can find the raw results for the data shown in this story (
In order to get a feeling on how reliable those numbers are, the tests have been executed multiple times, also with different durations. One execution of all tests takes for a duration of 45 seconds and the shown connection counts roughly 24 hours. If the results are compared, they show equal results. Based on this it is assumed that the results are stable.
Final hints on the test setup and the results
Some important remarks are shown on the results and test setup are discussed which should be kept in mind when assessing the results:
As described the tests are more real-world tests than carefully crafted “lab” test scenarios which ensure equal conditions. Although this tests have been designed with equal conditions in mind, they are only equal to a certain degree. But the understanding of this tests is to model the real-world which is not perfect and has to deal with such “problems”. For example third-party libraries used have been picked by popularity against comparing them and choosing equal ones, because in the real-world a developer will pick a favorite most likely.
One could basically go into every technology or container and optimize it to deliver the best possible performance. But this is not the aim of this tests. This tests should provide a more high-level view on the topics and compare them by their “default” configuration or behavior. Basically use them “as-is”.
The tests have not been executed on real server hardware which could be one flaw. The have been executed on advanced prosumer hardware. This might mean that the absolute numbers are different on real server hardware since such CPUs are optimized for server use-cases. But for this tests not the absolute values are important but rather the relative values to compare the different frameworks with each other. But keep in mind that server CPU architectures are different and might change the results.
There are countless frameworks and technologies out there which can be used to build a web service. Some specialized and some general ones. This tests do cover obviously only a small selection of them. But they should be the most common ones, used nowadays for web development.
The tests might be extreme for real-world applications. But knowing the limits is always a good thing. That does not mean that the limits need to be exceeded or even reached. But to some developers it gives a certain amount of security to have freedom in case of any new requirement which comes around the corner at every time in the real-world.
Another point is the usage of the wrk tool. Although it seems to be a fairly useful tool there might potentially be a “better” tool for this task. Since this tests are meant to compare to each other and should not serve as absolute results in the first place there is no problem with this aspect. The only requirement which results from this is the fact that all tests should be carried out equally with the same tool and the same configuration. And this has been done.
In the results section where some assumptions made on how a certain behavior comes in a certain technology. They are not grounded by a deeper analysis but they are a plausible guess. The main focus of this article is the initial analysis of the performance for the technologies and test cases. Analyzing the behavior more in-depth is beyond the scope of this article.
Finally, the Nancy framework has been deprecated while I was working on this tests. Despite this fact I decided to keep it in the tests, just to see the results.
If you ask for the winner there is only the answer: there is no winner. The short answer for this reason is: “It depends”. This seems a little bit unsatisfactory but the technology which you want to use for your next project heavily depends on the architecture and the use-case which you want to achieve. Besides the pure technical aspects there is also a business aspect: it is possible to write the most performant service in pure C language but the costs for development and the ability of e.g. integrating new functionality quickly will be huge agains using a more high-level language. This aspects also need to be considered when selecting a technology for a project.
For my project, which caused me to start this tests, I used NodeJS (NestJS framework on Express) and I will continue to use it since it brings so much benefit when developing: ease of use and integration with code from common libraries used across the other services. So it could have been possible to implement the use-case in Go or even Nginx/OpenResty and part of the developer inside me would have really enjoyed to do this but from an organizational and business perspective the current way makes much more sense. This should not be a pleading for preferring the business aspect but rather for keeping all aspects in focus when making such decisions. This is also the reason why I went with this more “freeform” approach to test and compare those technologies against carefully arranging a “lab” experiment on it: in the real-world much more aspects come into play and most of the time the technical performance difference between those technologies is not that much important.
Nevertheless it is pretty interesting to see the comparison of the different technologies and I guess there is less doubt that Go lang is the winner of those performance comparisons since it provides a very good performance combined with a really tiny docker image size and also a moderate memory footprint and the capability to be executed on multiple cores.
The technology which impressed me on both the performance and the idea is the Nginx/OpenResty server which seems perfect for smaller tasks with high performance. Since there is much support for Nginx/OpenResty it is surely possible to tune the performance of it. This is definitely a takeaway for me from this tests and I might use it in an upcoming project.
Another insight is the fact that as much as I like the Java language for their cleanness in language constructs and the enterprise technology aspect, it takes too much resources for my understanding. It is not good at all and I did not expect this.
Besides all this building up the test infrastructure was a nice challenge and had it’s own sweet spots and takeaways. But this is a story for possibly another article. I worked on this entire “project” over the course of approximately four weeks after-work and I’m now happy to publish this story. I hope you had fun and good insights while reading this story.
Thank you for reading!