How the Internet comes together!

For the Tech Interviews the age old question that anyone could expect is “What happens when you type www.google.com ?”. It is always expected from any Software Engineer to understand in depth about the working of the Internet, the way in which computers contact each other and do things such as exchange email or display web pages. In this article I am going to try my best to explain my understanding about the slightly modified question, What happens when you type “holbertonschool.com” in your browser and press enter?
Step1: Type “holbertonschool.com” and press enter
Step2: The typed URL is parsed and the browser checks if the domain is in its cache or not. If not it proceeds to make a system call (gethostbyname) for a lookup in the localhost file. If not found, it goes on to check in the Router cache for the IP.
Step3: We still dont have the IP and so the next step is for your browser to send a UDP request to the DNS server configured in the OS’s Internet settings for the connection being used, passing the domain name. This DNS server typically belongs either to your company or to your local Internet Service Provider. DNS resolver is queried for the IP now. Resolver goes through a process called Recursion to find the IP corresponding to the domain name. Resolver starts by querying one of the root DNS servers. There are 13 root servers from A-M in over 380 locations. All these servers are copies of one master server run by IANA(Internet Assigned Numbers Authority). The Root servers hold locations of all TLD(Top Level Domain) servers such as .com, .de, .io etc. The root server may not know the IP address for ‘holbertonschool.com’ but it knows that .com server might know it, so it returns the location of the .com server. So the root responds with a list of .com servers.
Step 4: Now the Resolver queries the .com servers. Once it gets the IP for ‘holbertonschool.com’, Browser will establish an SSL connection over HTTPS ( Hyper Text Transmission Protocol Secure, which is the secure version of HTTP) to ensure that the communication between server and client is private and encrypted. Similar to TCP Handshake(which will be explained soon), SSL also establishes a connection with the client through SSL handshake, which can be easily understood from the image below.

Step 5: One thing to note is that its not just SSL certificate that ensures security for the data transmission. There is another component called Firewall which prevents hackers from accessing your system. (More explanation about Firewall can be found below. )
Now the browser has to serve the web page. For that it sends HTTP request to the Server using GET. Browser passes some meta information in the form of headers to the server along with the URL “holbertonschool.com”. HTTP requests made form browsers are handled by a special Software running on the server known as Web Server. example Apache, Nginx, IIS.
Step 6: The Web server then passes the information to a Request Handler ( a program written to handle Web Server. example PHP, ASP.NET ). If the page to be served is Static, it is generated and an HTML response is created. This is then sent back to the browser. The serving of Webpage is done such that initially the HTML skeleton is received, followed by the static components like images, styling, javascript files etc. (Thats the reason why we sometimes face delay in loading of images even after the web page has been loaded!). These static files are also cached in the web browser so that they do not have to be transmitted the next time and can be served directly from cache.
If the information to be displayed is dynamic(Javascript, Ruby, PHP), the Application Server assists in pulling up the data from the codebase, translate them to an understandable format by compiling and then sends back to the client. If the data is to be retrieved from Database, Application Server again acts as a mediator, establishes a connection to the Database, pulls up the data and sends it back to the client.

The above steps are just a high level explanation of what is actually happening behind the scenes in a matter of milli seconds when you enter holbertonschool.com. There are numerous terminologies that you should be well aware of while understanding the above steps. I shall explain them in detail.
- DNS request
Domain Name System (DNS) can be explained as the heart of how the Internet works. When someone wants to open a webpage, he types in the URL and the DNS resolves and gives back the IP address of the webpage for the browser to be served. In other words, the DNS acts as a post office which accepts the address written in plain English(holbertonschool.com) and converts them to a series of numbers or IP (123.43.23.121), marks the location of a computer on the internet, similar to the house number and street of where you live. For more details on how DNS plays role here, go back and read Step2.
- TCP/IP
TCP(Transmission Control Protocol) breaks down and reassembles the data packets whereas Internet Protocol(IP) is responsible for ensuring that the packets are sent to the right destination. At first, for establishing any TCP/IP connection, the client(PC) sends a SYN packet. Web server(for eg, google.com) sends SYN-ACK packet back. Client then answers with an ACK packet thus concluding a 3 way TCP connection establishment.(This is known as TCP Handshake).
TCP/IP is used is because the internet is a Packet switched network, where information sent is broken into small packets, sent over many different routes at the same time and then reassembled by the receiving end. Fascinating isn’t it? Well, each of these packets should be fewer than 1500 character length due to a lot of reasons, one of which is hardware limitation. So you can imagine the number of packets being sent for a single web page request, the packets being transported through different routes and reassembled to display the webpage! To ensure that no errors have occurred during the transmission and that all packets have been received, as packets are created, TCP also calculates and adds to the header a ‘checksum’ which is a number that it uses on the receiving end Checksum is based on the precise amount of data in the packet.
For computers to use TCP/IP they need special software known as Sockets. Read Step 3.

- Firewall
Everytime a computer is connected to a network it faces potential danger by hackers. Firewall comes as a savior which enable anyone to access the internet but protecting from crackers, hackers and others on the internet from gaining access to the corporate or personal network causing damage. Firewalls are hardware and software combinations that are built using routers, servers and a variety of hardwares.
- HTTPS/SSL
HTTPS(Hypertext Transfer Protocol Secure) is HTTP over SSL, where the SSL connection is established first, and then normal HTTP data is exchanged over this SSL connection. HTTPS is in other words the protocol for secure communication and it prevents hackers from information that visitors send or receive over the Internet.
To learn more about it, go back to Step 4.
- Load-balancer
Servers should be designed to accept concurrent requests from a large audience and serve response to them without crashing it. To serve the large number of requests, usually the content is distributed across multiple servers. A load balancer is placed in front of the server to redirect the incoming traffic so that the load on each server is reduced, hence the name! It helps in high availability, reduces load on the server and helps in serving the response faster. There are a lot of algorithms to connect load balancer and server. The load balancing algorithm determines which active servers on the backend can be used. A few of the commonly used algorithms are Round Robin, Least connections, Source.
- Web server
A web server stores and delivers the content for a website to clients that request it. The most common type of client is a web browser program, which requests data from your website when a user clicks on a link or downloads a document on a page displayed in the browser.
A web server communicates with a web browser using the Hypertext Transfer Protocol (HTTP). The content of most web pages is encoded in Hypertext Markup Language (HTML). The content can be static (for example, text and images) or dynamic .
- Application server
Application Server consists of Web server connectors, Runtime libraries, Database connectors etc which are needed to deploy, configure and manage code and connect these components together on a server.
