Behind the scenes of opening a web page

We all know when we type some webpage into our browser and click on enter, the webpage pops up in no time. Well most of the time… But let’s look at what’s happening behind the scenes and how does all that information get retrieved.
Let’s first take a look at some components of the infrastructure that our data goes through.
First to even open a webpage we need a computer. We refer to it as a client because we are the ones making the request for opening a webpage. Let’s say we want to open https://www.holbertonschool.com webpage. We open our browser of choice, type it in search bar and press enter. Ok. So now what? We all know our houses have modems, routers and boxes of sorts but here we won’t go into detail about any of those. We do know that most of the time those get provided to us by our Internet Service Providers (ISP) like Comcast, AT&T etc. So since all those ISPs have to be able to pass information between each other they are connected into a network we call Internet. So the Internet is the global system of interconnected computer networks that use the Internet protocol suite (TCP/IP) to link devices worldwide. We’ll discus TCP/IP more in detail in a little bit. So when we type our webpage and press enter, our request is sent to our ISP as HTTPS request and it will first go to DNS server and do a DNS request.
Now, what is a DNS request? Let’s step back for a second. First thing we need to point out that webpage names (DNS names) are just easier way of accessing information but they are in fact IP addresses (IP stands for Internet Protocol). So for example, if we say ‘let’s have bbq at Julija’s house’ that is easier to remember than the actual address but once we actually want to got there we need that actual address so we know where we are specifically going. It’s similar with DNS name and IP address. DNS stands for Domaine Name Server and DNS is the one that resolves out DNS name into IP address.
Hyper Text Transfer Protocol (HTTP) is the protocol over which data is sent between your browser and the website that you are connected to.
The Transmission Control Protocol (TCP) is the protocol that complements the IP so it is commonly referred to as TCP/IP. TCP is the one that transfers data through network. It provides end-to-end data communication specifying how data should be packetized, addressed, transmitted, routed and received.
Now let’s talk about server side. To keep things simple server is also a computer but you can think of it in a simpler way. It doesn’t have keyboard, screen and other peripheral components desktop computers have. It’s more there to just respond to requests and provide information stored on it that was actually requested. So how does it do that? It must have some sort of software that enables it to find and return the correct information.
So in order to work server has to have operating system on it. And then we also have VM (virtual machine) where we have all of our other components, like: firewall, https/SSL, load-balancer, web server, application server, database, code base etc.
All of these components can be in there own VMs or on separate servers for purposes of scaling easier but let’s just talk about this on the high level.
The first point of contact on server side would be load-balancer. Load-balancer is a device that distributes network or application traffic across a number of servers. They are used to increase capacity and reliability of applications. We can have multiple load-balancers to reduce the risk of SPOF (single point of failure). Multiple load-balancers are mostly set up as what we call a cluster. Based on how we set it up, once our load-balancer receives a request it decides which server to send a request to. We assume here we have more than one server.
In order to keep our data secure we use Firewalls and SSL (Secure Sockets Layer) certificates. SSLs enable us to encrypt data so that it can be transferred securely over HTTPS (S at the end stands for ‘Secure’) which is a secure version of above mentioned HTTP. So what SSL does is, it provides the encrypt key so that when we send and receive data we can decrypt it. We can use more than on SSL certificate. If we have only one, we usually put these on load-balancer because that is the first point of contact on server side so if the data is compromised it won’t even get to other components and that way we can protect them. Firewall is a network security system that monitors and controls the incoming and outgoing network traffic based on predetermined security rules. It’s a barrier between a trusted, secure internal network and another outside network. So if we let’s say have firewall set up on our server and our load-balancer is set up on another server we can make a rule that our server accepts traffic only coming in from our load-balancer IP because that is the only secure traffic.
So if our load-balancer receives a request, makes sure it’s coming from secure source via HTTPS, decrypts it using SSL and now it sends that request to Web server. A Web server is a program that uses HTTP or HTTPS to serve the files that form webpages to users, in response to their requests.
Web server checks the request and gets content from code base. Once it receives the content if the content is only static it sends it right back and if there is a dynamic content then it requests that content from Application server. Application server is a software that processes the dynamic content and it provides it back to Web server that then sends it back as a response to request. So this dynamic content is usually some sort of information that need to be queried from database. So Application server is the one that communicates with Database (like MySql). Database is an assortment of data that is organized to be easily accessed, managed and updated, and range from relational databases to cloud databases. For data persistence we usually have one main database called Primary and the other one as a backup called Replica in case something goes wrong with primary one. Usually they are set up so that primary one is set up as read-write and replica as read-only. So basically if we add data to our primary one primary one will write same data in replica and then send and acknowledgment that the data was successfully written in the database. This is done through application sever too.
Once all that information is gathered from database and processed by application server it’s sent back to web server that then sends it back through load-balancer and internet back to client.
This is very brief overview of what happens from the moment you type in web site until it pops on your screen. Crazy right that all this happens in the matter of seconds. In the next post we’ll look at it more in detail…
