Go Goroutines vs Node Cluster & Worker Threads — Part 1

5 min readAug 14, 2019

UPDATE: You can view the individual results of each test here:

https://www.dropbox.com/s/kvyc1a48pfy6y3n/Benchmarks%20v1.6.xlsx?dl=0

About 30-days ago, I found myself in a frustrating spot with Node.js/JavaScript and I decided to take a serious look at some of the alternative language options I could use to convert half a dozen services to.

Golang (Go) was my first choice and most of the reviews had very positive things to say about Go, particularly around Go’s performance potential/concurrency.

During my research, I read many articles on Go, as well as Go vs X language. A common theme I noticed is people either bashing Node.js, (Node) with little understanding of what they are talking about, or showcasing Go as being far more performant than Node.js. In the cases where Go was far more performant, Node was mostly being run at a massive handicap and for me making an informed business decision, I want to know how the two actually perform against each other.

The following article is an example of such a lopsided comparison. Node.js is compared to Go, in a scenario where Go is setup (by default) to use every available CPU thread, meanwhile Node is running in a single threaded, single process. I find this type of content misleading and detrimental to those who may be new to Node.

https://stressgrid.com/blog/benchmarking_go_vs_node_vs_elixir/

With that being said, I’m not here to bash these articles, however, I would like to outline my own results of testing Node.js vs Go. I’m also not a Node fanboi, in fact I have been known to refer to using JS on a daily basis as being in a dysfunctional relationship. :)

I’ve decided to break this article into four parts as follows:

Part 1: Vanilla HTTP servers returning OK string
Part 2: Vanilla HTTP servers doing CPU intensive work
Part 3: Vanilla HTTP servers doing CPU intensive work with SHA256 and RSA
Part 4: Vanilla HTTP servers doing I/O with Neo4j, MongoDB and network requests

I’d like to point out that by default, Go takes advantage of every single CPU thread. On my dev workstation with a Intel i7 5960X CPU, Go uses all 16 threads by default. A Node process is single threaded, however, in the background, Node will use additional threads to execute asynchronous code.

In Go, you can configure Go to utilize a single CPU thread only.

runtime.GOMAXPROCS(1)

In Node, if you want to take advantage of more threads, you can use Worker Threads (https://nodejs.org/api/worker_threads.html) for scenarios where you wanted to break up a body of CPU intensive work across multiple threads. For example (hypothetical scenario alert), if you wanted to create 1 million SHA256 digests and sign each of them them with RSA, you could divide that work amongst ten worker threads, each working on 100,000 digests/signatures each.

Another option in Node is if you don’t need multiple threads working on a single task and simply want to increase a services throughput for example, you could use the Node cluster module (https://nodejs.org/api/cluster.htm) to create additional processes that all do the same thing and have Node automatically load balance all requests between these processes.

The specifications for my development PC include.

Ubuntu 19.04
Intel i7–5960x @3.00GHZ
CPU cores 8
CPU threads 16
64 GB Corsair Vengeance RAM
Sapphire Vega 64 GPU

The specifications for my testing machine include:

Ubuntu 19.04
Intel i7–2600K Processor
16 GB RAM

Part 1 — Vanilla HTTP Servers Returning OK String

My first test was a single process vanilla Node HTTP server vs a vanilla Go HTTP server, with the caveat that Go is set to use a single CPU thread (I can hear the “rrreeeeees” now).

runtime.GOMAXPROCS(1)

Using wrk on my testing machine: (https://github.com/wg/wrk)

wrk -t8 c1000 -300s http://192.168.0.14:4000

Go handled a total of 18,994,749 requests over 5 minutes and averaged 63,298.78 requests a second (r/s). Node handled a total of 12,624,381 requests over 5 minutes and averaged 42,067.40 requests a second. Go outperformed Node, while running on a single CPU thread.

The next thing I did was allow Go to use all 16 CPU threads, and I used Node’s cluster module to have 16 processes running at the same time. I personally don’t care if people see this is being a fair comparison for whatever reason they might have. From a business standpoint it boils down to this. If I’m going to use a language/runtime, I want to use it at its full potential and in Go, that means Goroutines, with access to all CPU threads. In Node, that means cluster or worker threads, depending on the scenario. Comparing both when they max out the hardware I’m using is what matters to me, whichever way they do it.

Just a note, Go will spawn a Goroutine for every HTTP request it receives by default.

UPDATE: For clarity, I ran 3, 5 minute tests each, so 15 mins each in total.

Here are results for the same wrk command as above, averaged over three different tests for each:

Go: Average of 113,773,953 total request handled over 5 minutes, 378,829.42 average r/s
Node (cluster): Average of 101,951,021 total request handled over 5 minutes, 339,722.35 average r/s

You can find the code for these tests here:

Go: https://gist.github.com/danielcasler/9cc91fca9c611a5f3d5dec1c7df0c165
Node (cluster): https://gist.github.com/danielcasler/417cd88062d3bebcfaf8708ebc17862e
Node (no cluster) : https://gist.github.com/danielcasler/77617193024956fb964f568afbf8f04a

I did not user worker threads for this part as they do not offer a benefit to simply handling more requests per second.

Update 8/19/2019: I did some testing with the uWebSockets.js library and the numbers are very impressive. On only two Node.js processes, I was able to send 120,112,693 total requests with 405,932 r/s over 5 minutes.

https://github.com/uNetworking/uWebSockets.js

Part 2— Vanilla HTTP Servers Doing CPU Intensive Work

In Part 2, I am considering using a function like the following to see how well Go and Node perform when each requests has to iterate 1,000 times and for each iteration, calculate the Fibonacci Sequence 1,250 times, push the result of each Fibonacci Sequence in a slice/array. I’m open to other ideas here as well.

Fibonacci Sequence: https://gist.github.com/danielcasler/b15d0efe161a15eeb519294d210e3a8c

Once that is done, I will do some crypto and I/O with databases and maybe mix them together.

Anyways, I hope you find some value in this post and if you made it this far, thanks for read.

One thing I should note is that Node cluster does not perform as well in Windows as it does in Linux.

EDIT: I forgot to mention that my total memory usage was 5,463 MB with Go, where my total memory used was 5,870 MB with Node.

CONCLUSION

Although Go outperformed Node in every test, with Node cluster, I feel Node was certainly able to hold it’s own and there wasn’t a drastic difference between the two (10% when it came to r/s).

Cheers,

Caz

📝 Read this story later in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.

Go Goroutines vs Node Cluster & Worker Threads — Part 1

Part 1 — Vanilla HTTP Servers Returning OK String

Part 2— Vanilla HTTP Servers Doing CPU Intensive Work

CONCLUSION

Written by Dan Casler