What Really Happens when Browsing a Webpage
People are often asked, “What happens when you type www.google.com in your browser” in order to gauge a developer’s knowledge of computer networking. Instead, I will examine what happens when you type holbertonschool.com or any web search in general.
When you start typing in the address bar, most browsers will try to autocomplete before you finish typing the whole address. The algorithm generally prioritize autocomplete around history and bookmarks. As soon as you press ENTER, the browser then have following information contained in the URL (Uniform Resource Locator): (1) Protocol “http” (2) Resource “/”, the main index page.
If no protocol or valid domain name is provided, the browser proceed to feed the text given in the address box to the browser’s default web search.
The purpose of the Domain Name Server(DNS) is to resolve domain names into IP addresses. The following steps are taken to resolve the name.
Browser: The browser first checks the cache to see if it is known.
OS: If it is not within the browser’s cache, the OS will then the local
hosts file to see if it it there. Note that the location of this file varies by OS.
Resolver: If the OS cannot resolve the domain name, the resolver will be requested. The resolver is generally the ISP. The resolver will check its own cache first before taking other steps. Otherwise, the resolver goes to the root server.
Root-server: The root server tells the resolver where to locate the top-level-domain(TLD) for .com. Note that the root server here is 1 of 13.
TLD: The resolver heads over to the .com TLD with efforts to resolve the domain name. If the TLD does not know, it will direct the resolver to the holbertonschool name server.
name-server: when the domain is purchased, the domain registrar reserves the name and communicates to the TLD registry the name servers. A name server is a specialize server that handles DNS queries, website IP address, mail IP address, and so forth. It has the information on how to resolve the IP address for holbertonschool.com. Note that there are generally multiple name servers to a specific domain, and each one of those name servers know how to any domain managed by holbertonschool.com.
When the domain name is resolved into an IP address. The resolver eventually returns to the OS with the information, in which the OS will store the IP for later use as necessary. The IP address is 220.127.116.11.
Note: will expand on ARP later
The browser also checks its preloaded HSTS (HTTP Strict Transport Security), this is a list of websites that have requested to be contacted through HTTPS only. If a website is on the list, an HTTPS will be sent over, otherwise, HTTP will be used. In the case of holbertonschool.com, it is on the HSTS list.
Once the domain name is resolved, a communication need to be opened up between the client and server. This is done so through TCP/IP, Transmission Control Protocol/ Internet Protocol. This is initiated through TCP’s “three-way handshake”.
Once the connection is established between the client and server, packets with streams of bytes can be transferred between the two.
Note: will expand on the OSI Model
The firewall is a security system that monitors and controls the incoming and outgoing network traffic based on predetermined security rules. Firewalls can be a combination of hardware and software that isolates internal network from the internet at large, allowing some packets and blocking others. There several goals to the firewall: (1)All traffic from outside to inside and vice versa passes through the firewall. (2) Only authorized traffic, as defined by the local security policy will be allowed to pass. (3) The firewall is immune to penetration.
SSL (Secure Sockets Layer) is the security technology for establishing an encrypted link between a web server and a browser. This link ensures that all data passed between the web server and browsers is private and has integrity. To create an SSL connection, a server requires an SSL Certificate. When a SSL certificate is activated, the server will be prompted to answer a series of questions about the identity of the website and company. The web server then creates cryptographic keys, a private one and a public one. The Certification Authority(CA) is responsible for the issuance of the Certificate. The CA acts as a trusted third party, in which it utilizes ”domain validation”, to authenticate the recipient of the certificate. When a browser connects to a secure site, it will do the following: (1) retrieve the site’s SSL Certificate and check that it has not expired, (2) it has been issued by a Certification Authority the browser trusts, and (3) that it is being used by the website for which it has been issued. If it fails on any one of these checks the browser will display a warning to the end user.
Assuming you did not get blocked out by the firewall and that holbertonschool.com has a valid SSL connected over HTTPS (which it does), you will reach the load balancer. The load balancer can be a hardware or software that distributes the work-load of a system over multiple individual systems, or group of systems so no one system will be overburdened. The advantages of a load-balancer includes (1) increased performance of your application because of faster response, (2) no single point of failure if a server crashes, (3) scalability, and (4) reliability. Load-balancing algorithms include weighted scheduling, round-robin, least connection first scheduling.
Web server, Application server, and Database
There can be multiple web servers that can listen in on the load balancer to perform the HTTP request. The web server acts as the driver when facilitating communication between the application server, database, and code base. The web server will generally grab static content as well as dynamic content. When dynamic content is retrieved, the content is then sent to the application server which dynamically generate information retrieved from the database server. Once that is done, all the information gets sent back to the web server as in complete HTML format, which then gets sent back to the client.
Once that get sent from the server side, the browser then display the content returned. The browser’s functionality is to present the web resource you choose, by requesting it from the server and displaying it in the browser window, that resource is generally a HTML document; however, it can be other resources as well. There is more to this process, but in short, the browser now successfully displays the content retrieved from the server. I will expand on the topics discussed above as I dive deeper into the network layers. Please stay tuned.
Computer Networking: A Top-Down Approach by Kurose and Ross
what-happens-when - An attempt to answer the age old interview question "What happens when you type google.com into…github.com