Scalability - Deep dive
In the previous blog, we have learned about what is scalability and the different types of scalability.
What we discussed in the previous blog is that we can’t solve our problem by simply by throwing money on resources and increasing them. Horizontal scaling is accepting the fact that there will a ceiling eventually where we wont be able to increase more computation. So why don’t we architect the system in such a way so that we don’t hit ceiling by getting a bunch of cheaper and faster machines.
The story was simpler when we had a single server with the IP adress and hostname associated with that particular server/machine say “server-A”. Overnight our website grew fast and we on way to accept horizontal scaling got multiple such servers/machine to server our purpose say “server-A, server-B and server-C”. Now when we have so many servers with our code running in it, what’s its relation with HTTP and DNS i.e when Alice and bot hits “www.popularwebsite.com” on what server will they end up??
Well, when we have multiple servers, we want our load to distribute our inbound HTTP request among all these web servers i.e “server-A, server-B and server-C” we dont want anyone server to be burden with incoming requests and other server just laying three and chilling. For this exact purpose, we have a “Load balancer”. For now, consider this as a black box which is basically distributes the load/ incoming http requests among our web servers.
How does the load balancer achieve this? i.e Now when I am hitting “www.popularwebsite.com” which of my 3 web servers will serve my request?
What we can do here is to expose the public IP address of the load balancer (i.e 184.108.40.206 , as per the above picture )to DNS and these web servers/nodes not required to even have their own public IP address. They have a private IP address which is known only to the load balancer and not to the outside world so that they won’t be contacted directly by any bad guys and moreover the world is running out of IPV4 and hence it’s hard and costly to get public IP so it is a plus to have a private IP address associated with these nodes which is known only to the load Balancer and load balancer will direct the incoming requests to these nodes.
How does the load balancer decide to get data from the backend server or which of these nodes will get the chance to serve the incoming request?
On a board picture, when the request arrives at the load balancer ,it can decide on various factors on which it can send the request to server 1 or server 2 or server3 such as load (i.e how busy a server is?).
All these web servers have the same content, same code running in them. We have achive redundancy here.The downside is previously we need only n disk size and now we have (n* no. of servers ) disk size, this is the price we pay for having this redundancy or horizontal scaling.
Another approach can be is let’s have server 1 contains only image files and server2 with HTTP files and server3 with videos file, even in this scenario we can have more load on let’s say video’s server i.e server 3.
A more heuristic approach can be a simple round-robin i.e first time I hit the server load balancer returns server 1 and next time server2 and next time server 3 and so on…The price we end up paying here is that maybe one server ends up getting more heavyweight users than other servers. Round robin by its nature keeps on sending the requests to the same server overburding it, which is not good.
Another drawback is that due to caching, we keeps on sending load to a particular server as there is no need for the browser to send the same DNS request every single time you click on the link as we will lose some amount of millisecond so typically our os and browser cache these responses to prevent these lookups.
The better approach can be to let the load balancer decide on which server to send a request using any heuristics be it round robin or any other randomness factor.
One of the thing that can go wrong here is session. Why so? Well, because the session recall tends to be specific to a given machine. On Linux system, sessions are saved as a text files in /temp. Let's say my session sits on /temp of server1 and what if my requests being sent to server2 or server3 ?? We were being asked to log in again and we have no idea why is this happening again and again and this secnario can failed badly in the case of e-commerce site.
How to resolve this now?
What about if we can have an external file server, like a big external drive which is connected to all the servers. This way all the servers can share state. Right now we have really good redundancy in our server model but as soon as we introduce a database or file server for our sessions if these servers die? Looks like we have a single point of failure again.
We solved the problem of shared state but we have sacrificed some robustness some redundancy. How to fix the ladder now?
Before moving on to this question let's discuss How to implement a load balancer?
This is implemented using different software and hardware which includes ELB(by AWS), HAproxy, and hardware like Citrix and cisco most of them are extremely expensive for what they do.
STAY TUNED for How to implement a load balancer? coming up next in the new blog.