Let’s take a look at how we need to wire all the servers to get the most out of them. The article continues from this one.
On-premises HA Kubernetes cluster
So why 9 servers? Well, so far we can separate the parts that make the cluster into 3. We have the Storage, the Control…
We have at least 9 servers that communicate with each other and even on standby there’s a lot of data travelling to and from any of them. In the storage article above, I explain the various data that goes through the cluster.
The dedicated servers you can rent online usually come with a 1Gbps network card (and network uplink).
1Gbps is about 128MB/s. And that’s the fastest our servers can go. That’s not good if we need it to use it both for the storage cluster and to host different applications.
Fortunately, networks come in different flavours. Some providers offer 10Gbps networks or even 40Gbps. These obviously cost but the real issue is that it’s hard to find a provider that will set up the network the way you want it. Just having access to a 10Gbps network is not enough.
Let’s separate the network connection into groups.
Lane 1: Control plane nodes that need to keep each other updated
Lane 2: Worker nodes communicate to the Control plane nodes
Lane 3: Storage nodes that need to keep each other updated
Lane 4: Worker nodes save files to the Storage nodes
Lane 5: Worker nodes accept web request serving also files
Now let’s check what kind of traffic each of these “lanes” would supposedly have.
Lane 1: The most important communication here is between each ETCD replica
Keeping in sync the ETCD replicas doesn’t require huge payloads but it is very frequent. It should be enough putting it in a VLAN just to isolate the data.
Lane 2: Worker nodes need to know what containers need to keep up and that’s done by checking continuously with the Control Plane which is frequent but has small payloads.
For Lane 1 and Lane 2, we can use a 1Gbit uplink. That’s more than enough as long as we don’t start deploying applications on the Control Plane nodes.
Lane 3: So we have 3 servers that we want to use for storing persistent data and we decided to replicate the data 3 times. This means that we need to consider 2 things.
- Every time we store a file in this cluster, it will copy it 3 times.
- If one of the servers fails, it will recreate it’s desired status of having 3 replicas, thus it will start copying all the data that was supposed to be on the failed server and distribute it to the other servers.
One thing on our favour though is the fact that when we write files it will kinda write it contemporary on the three servers balancing it and making it up to 3 times faster. This is the same for reading.
Now imagine having as little as 10GB stored and one of the servers fails, it will have to copy 10GB of data between the other two. With a 1Gbps network, at about 128 MB/s, even if doubled the speed through parallelisation, it would reach 250 MB/s, which is very little for 10GB. For the copying to complete, it would take about 40 seconds. The slower it is, the more load the servers will have.
Let’s see how fast it would be with a 10Gbps network. 10Gbps is around 1.24GB/s. Copying the files would take 8s. That’s 5 times faster.
The maximum speed is the minimum between disk speed and network speed. So if our disks have a maximum write speed of 300 MB/s (enterprise level), our write speed will be that for a single server and triple for the cluster. So we can reach speeds of up to 900 MB/s more or less. With a 1Gbps network, the top speed would be that of the network which is 250 MB/s, with a 10Gbps, it can reach 900 MB/s but can’t go to 1.24 GB/s, or triple that.
At this point we’re sure that we need a dedicated line for the Storage servers and it needs to be faster than 1Gbps.
Lane 4: Worker nodes, obviously need to store a lot of files, be it media like images and videos or just database files. They need to be persistent so we need to save them to our secure replicated storage service.
All of the above considerations about speed are true for this scenario as well. That means that we need a 10Gbps network to the Storage servers to have a fast enough data handling.
At this point we’d have something like this. A dedicated network for the storage servers to share the files between one another, and a dedicated network for the worker nodes to store files and read them back.
Lane 5: Lastly all the Worker nodes are supposed to be exposed to the internet for all the websites and web applications they have to serve.
Usually a 1Gbit connection for this kind of traffic is fine, it can handle both serving media and web pages. Considering an average of 600KB per page, with a 1 Gbit uplink, we can handle 213 requests per second. We have 3 servers, each with its own uplink which means we can serve up to 630 requests per second.
It’s not that bad if it were to be hosting a single website. But we have created a cluster that could potentially host hundreds of websites. And considering that, if we serve just a few images that weigh about 3/4 MB/s, the requests/s rate drops drastically. It just wouldn’t justify the cost this way.
With a 10Gbit uplink instead, we’d handle up to 10 times more requests. That’d be 2130 requests per second per server and a total of 6300 requests per second. Which can be upped if necessary upgrading the network to a 40Gbit uplink.
To lower the pressure on the network even more, I’d suggest using distribution services for media.
These are a few:
- Imgix resizes images on the fly and acts as a CDN for all kinds of static assets. Very easy to configure.
- S3 with or without Cloudfront. S3 is a very good alternative to storing and distributing static assets, especially when combined with Cloudfront.
- Vimeo is a great option for both storing and distributing videos. Especially because it does all the encoding for you.
Articles in the series
On-premise HA Kubernetes cluster
Overcoming infrastructural limits while setting up a production Kubernetes Cluster
Storage on Kubernetes
Understanding how Storage works in distributed systems and Kubernetes. On-premise storage solutions
Thanks for having the patience to read this far.
Stay tuned for more :)