How to create a multi-threaded HTTP server in Julia

I’ve been super inspired by Julia lately, and being a person who likes pushing the edge of performance of technologies, one of the first things I was interested in how fast does HTTP.jl run? Turns out it’s pretty competitive when tuned right.

But one question loomed in my mind? How was a language that makes distributed computing so easy taking advantage of it in web serving. It turns out sadly, it is not. If you try spawning two servers in two different processes.

using Distributed
addprocs(1) # create a proc
@everywhere using HTTP
@everywhere function runserver()
HTTP.serve("0.0.0.0",8001) do request::HTTP.Request
return HTTP.Response("hello world")
end
end
@async runserver() # run a server asynchrously on current process
@spawnat 2 runserver() # run a server on our created process' id

you will quickly get a nasty gram

IOError: listen: address already in use (EADDRINUSE)

I did some research down into the low levels of the API, and it appears at best we can do is not cause an error.

As per documentation adding a reuseaddr=true will quiet things down

HTTP.serve("0.0.0.0",8001,reuseaddr=true) do request::HTTP.Request
return HTTP.Response("hello world")
end

But will only allow someone to steal future requests

If reuseaddr=true, multiple threads or processes can bind to the same address without error if they all set reuseaddr=true, but only the last to bind will receive any traffic.

I wanted to try to find a solution that could allow even better performance, so I turned to a good old standby tool called nginx . This tool is considered a gold standard when it comes to exposing a variety of servers in a network to the outside world under a unified front. Why couldn’t it be used to expose a variety of servers on a local server? My setup would be simple. Create a server process for each CPU ( my laptop only has two ) on port 8000 and 8001 and run nginx to share the load between them off port 8080. Assuming linux schedules those processes correctly to utilize my CPU cores ( a common technique to rely upon), we should be good. nginx runs as a background daemon process on most systems

sudo apt-get install nginx

and is easily configured by a configuration file at /etc/nginx/nginx.conf

Here’s my configuration:

events {
worker_connections 10000;
}
http {
upstream myproject {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
}
server {
listen 8080;
server_name localhost;
location / {
proxy_pass http://myproject;
}
}
}

Pretty simple yes? In order to apply the configuration, simply restart the nginx server

sudo systemctl restart nginx

If you received no message all is well. Let’s verify this is working by changing our server setup a bit

using Distributed
addprocs(1)
@everywhere using HTTP
@everywhere function runserver(port,msg)
HTTP.serve("0.0.0.0",port) do request::HTTP.Request
return HTTP.Response(msg)
end
end
@async runserver(8000,"server 1")
@spawnat 2 runserver(8001,"server 2")

Now we should get a different message depending on which server nginx decides to use. We are able to see these differences by using a curl command (note, trying to see differences using browser is difficult, because browsers will try to maintain connection to the server it first hits).

richardanaya@penguin:~$ curl http://localhost:8080
server 1
richardanaya@penguin:~$ curl http://localhost:8080
server 2
richardanaya@penguin:~$ curl http://localhost:8080
server 1
richardanaya@penguin:~$ curl http://localhost:8080
server 2

By default nginx does round robin strategy for sharing work, but there are other strategies as well one can choose.

One question you might have is, how does having a middleman nginx affect performance of things? Let’s find out! We’ll use a very useful tool for benchmarking called Apache Benchmark

sudo apt-get install apache2-utils

This tool basically allows you to get the ping response distribution for many requests to a single server

ab -n <number of request> -c <number of concurrent request>

So let’s first take some advice from our 10k connections example. First turn off logger and change various system settings that let us have high concurrency.

https://github.com/aj-monk/C10k.jl#os-settings

Please note these tests run on my Chrome Pixelbook. Let’s first look at results without nginx by stress testing port 8000

ab -n 50000 -c 100 http://0.0.0.0:8000/
  50%     29ms
66% 32ms
75% 34ms
80% 35ms
90% 41ms
95% 74ms
98% 257ms
99% 304ms
100% 729ms (longest request)

Not bad , now lets see what nginx adds to this load by testing 8080

ab -n 50000 -c 100 http://0.0.0.0:8080/
  50%     31ms
66% 37ms
75% 41ms
80% 44ms
90% 52ms
95% 59ms
98% 69ms
99% 101ms
100% 2216ms (longest request)

While not a thorough benchmarking, I think we can offer that the addition of an nginx load balancer isn’t adding any glaring degrading aspect to performance. Given nginx reputation as a server to solve the 10k concurrent connections problems i’d put a good guess on this technique scaling for most people’s needs.

In summary, even though Julia lacks a multi-threaded server solution currently out of box, we can easily take advantage of it’s process distribution features and a highly popular load balancing tech to get full CPU utilization for HTTP handling.