How it works: visiting a website from your browser

What happens when you type a URL and hit enter?

Yosef Samuel
CodeX
10 min readSep 12, 2021

--

When you type the address of a website into the address bar of a browser, like https://www.holbertonschool.com, there are multiple steps that take place in the background despite you seeing the site within seconds.

Steps 1: URL

The first step, of course, is you put the URL of the website into the address and hit enter which is something you probably do dozens of times a day.

What is URL?

URL stands for Uniform Resource Locator. A URL is nothing more than the address of a given unique resource on the web. A URL is composed of different parts, some mandatory and others optional. Let’s consider this example:

the URL structure shown above has different parts with different purpose, let’s discuss the main ones:

  • Scheme (protocol): it tells the web servers which protocol to use when it accesses a resource on Internet. In the example, it is HTTPS (Hypertext Transfer Protocol Secure) — which is the most common scheme currently.
  • Domain name: it indicates which web server is being requested. ‘www.holbertonschool.com’ is the domain in the example. An IP address can be used but it is rare since it is less convenient. We can further divide the domain into parts, as follows:
  1. Subdomain: A subdomain in a URL indicates which particular part of your website the web browser should serve up.
  2. Second-level Domain: is the name of the website. It helps people know they’re visiting a certain brand’s site. For instance, people who visit “holbertonschool.com” know they’re on Holberton school’s website, without needing any more information.
  3. Top-level Domain: specifies what type of entity your organization registers as on the internet. For example, “.com” is intended for commercial entities and “.edu” is intended for academic institutions.
  • Port: is a unique number used to access the resources on the web server. It is usually omitted if the web server uses the standard ports ( in the example, it is omitted so it uses the standard port of the HTTPS protocol i.e. 443 for HTTP it is 80).
  • File path (path): tells your web browser to load a specific page. In the example, ‘/methodology’ is the path. If no path is specified (i.e. only a domain name is entered) then, the browser loads the default page, which usually helps you to navigate to other pages in the website.

the discussion above presents only the fundamental parts of URL . However a URL have additional parts not included in the above discussion For more about URL you can use the reference here.

Step 2: DNS

Computers and other network devices communicate using IP address to identify each other on the internet. URLs are human-friendly and IP addresses are computer-friendly.

So, when you enter the URL, https://www.holbertonschool.com, your request to load that page is sent to DNS servers that look up the domain name of WWW.holbertonschool.com to find its corresponding IP address. Without the IP address, the computer has no clue what it is that you’re after.

What is DNS?

DNS stands for Domain Name System. A DNS is a technology that translates domain names into IP addresses. DNS is the phone-book of the Internet.

How does DNS work?

In order to understand the DNS resolution process, you need to learn about the four DNS servers that are involved in loading a web-page. Your computer is involved the resolution process after the initial request.

1. DNS resolver: is usually your ISP (Internet Service Provider) but it can also be operated by your wireless carrier or a third party provider. The resolver knows which other DNS servers it needs to ask to answer query “What is the IP address of WWW.holbertonschool.com?”. It also keeps IP address cache of frequently requested domain names. All resolver must know one thing: where to locate the root server. The recursor can be thought of as a librarian who is asked to go find a particular book somewhere in a library. It is also called recursive DNS name server.

2. Root name servers: are the first step in the name resolution of any domain name. It can be thought of like and index in a library that points to different racks of books. It servers as reference to other more specific locations. The root server knows where to locate the top-level domains (.com, .net, .org), country code top-level domains (.no, .et, .uk), internationalized top-level domains which are ccTLDs written in the countries’ local characters, infrastructure TLDs and generic TLDs (.HOT, .PIZZA, .APP, …).

3. TLD name servers: The Top Level Domain (TLD) server can be thought as a specific rack of books in a library. It is the last part of the domain name, that is, the label that follows the last dot of a fully qualified domain name. For example, in the domain name WWW.holbertonschool.com, the top-level domain is com. The coordination of most top-level domains (TLDs) belong to the the Internet Corporation for Assigned Names and Numbers (ICANN).

4. Authoritative name server: provides the original and definitive answers to DNS queries. This is where the domain administrator has configured the DNS records for the domain. This final name server can be thought of as a dictionary on a rack of books.

Steps in a DNS lookup

  1. You type the URL, https://www.holbertonschool.com, and hit Enter.
  2. First your browser checks if it knows the IP address of the domain by checking in its own cache and OS’ cache. If it doesn’t exist, the OS calls the resolver.
  3. When the resolver receives the request, it checks it cache first. If the address of the website is not cached in the resolver’s system, it will need to ask for help from the authoritative DNS hierarchy to get the answer. However, to get to the authoritative server, the resolver sends a request to root name server first.
  4. The root server responds to the resolver with the address of the TLD DNS server (in this case .com). The resolver stores this information for future reference.
  5. The resolver then sends a request to the TLD server.
  6. The TLD server responds with the name and IP address of the domain’s authoritative name servers (which is called Glue records). The resolver stores the address. The figure below shows the authoritative serves of holbertonschool.com and the IP address of on of the servers.
  7. The resolver send a query to the name server.
  8. The name server responds with the IP address of https://www.holbertonschool.com.
  9. The resolver responds to the your OS with the domain’s IP address.
  10. The OS provides it to the browser.

Without DNS, you would have to remember lists of IP addresses instead of website names or URL. For more fun and detailed description of the DNS resolution process, check this.

Step 3: Browser sends a connection request to the website

Once your browser gets the IP address of the website, it start to set up a connection. The set up process is accomplished using a three-way handshake (aka SYN-SYN-ACK). This handshake is designed so that two computers that want to communicate information can negotiate the bases of transmission before transmitting data such as the the browser request.

Before talking about the three-way handshake process, lets first discuss the meanings of the messages used to negotiate and start a session.

  1. SYN(Synchronize): Used to initiate and establish a connection. It also helps you to synchronize sequence numbers between devices.
  2. ACK(Acknowledgment): Helps to confirm to the other side that it has received the SYN.
  3. SYN-ACK(Synchronize-Acknowledgment): SYN message from local device and ACK of the earlier packet.
  4. FIN: Used to terminate a connection.

The Three-Way Handshake Process

Step 1: Your browser (the client) establish a connection with the web server. It sends a segment with SYN and informs the server about the client should be its sequence number.

Step 2: In this step server responds to the client request with SYN-ACK signal set. ACK helps you to signify the response of segments that is received and SYN signifies what sequence number it should able to start with the segments.

Step 3: Then your browser responds the web server with an ACK signal to the server, and they both create a stable connection.

Three-way handshake

The sequence number is random and it indicates the beginning of the sequence numbers for data that the sender will transmit.

Step 4: SSL/TLS certificate

After the browser and the server setup stable connection, the browser will first check if the server provide some security. Before we talk about how your browser checks the security, let us talk about SSL/TLS.

What is SSL?

The acronym “SSL” stands for Secure Socket Layer. SSL is standard security technology for creating an encrypted network link between a server and a client, ensuring all data passed is private and secure. TLS stands for Transfer Layer Security.

You may have noticed that certain websites use HTTP and others use HTTPS (which our example use) while exploring the Internet. The difference between the two protocols is an SSL certificate. The ‘S’ in HTTPS stands for security. The communication between your computer and the web server of an HTTPS enabled website is encrypted with an SSL certificate.

Why is SSL needed? SSL/TLS is a protocol used by applications to communicate securely across a network, preventing tampering with and eavesdropping on email, web browsing, messaging, and other protocols. Any information transmitted between a client and a server is protected by an SSL certificate. Encryption is used to do so.

Now, let us talk about how your browser checks for security. The browser downloads the web server’s certificate, which contains the public key of the web server. This certificate is signed with the private key of a trusted certificate authority. The public keys of the major certificate authorities come preinstalled in your browser. Your browser uses this public key to verify that the web server’s certificate was indeed signed by the trusted certificate authority.

After your browser verified and authenticated the server, you browser will use the public key to generate a shared symmetric key which will be used to encrypt the the traffic in this connection. The generated key will be encrypted with the public key of the web server then sent back to the web server. This ensures that only the server decrypt the key since it has the private key.

Step 5: Browser downloads website data

Next, your browser sends a request to the website asking to download its data. This contains some additional information about what browser you’re using and the purpose of the connection.

The server receives this request, and then generates a response in a particular format. It sends this response back to your browser.

Your browser receives the response, and uses it to render the website you requested.

Step 6: That ‘s it?

Once you’re browser display the website, your browser work might not be done. If you click a link, the steps begin all over again. And if you send some information to the page, it uses that to perform an action. Depending on the website, your browser might have to interact with the server in the background.

schema illustrating the flow of the request created when you type https://www.holbertonschool.com in your browser and press Enter
When everything comes together

One Last thing…honorable mentions

The following are important components of the web serving and hosting process.

Load Balancer

Websites must server hundreds of thousands, if not millions, of simultaneous requests from users and must return the correct text, images, videos, or application data in fast and reliable manner. To meet this high demand, generally requires adding more servers to distribute the load across multiple servers.

A load balancer sites in front of servers and route client request across all servers capable of responding. It distributes the work-load across multiple individual systems, or group of systems to reduce the amount of load on an individual system. This ensures the reliability, efficiency and availability the service provided by the servers.

Load Balancer

The following are the main functions of of a load balancer:

  • Distribute requests or network load efficiently across multiple servers
  • Ensure high availability and reliability by sending requests only to servers that are online
  • Provide the flexibility of scaling up and scaling down per demand.

A load balancer can be a hardware or a software. For more on load balancing check here.

Firewall

Firewalls are hardware, software, or an implementation of both the filter all traffic coming into and out of a server. SSL/TLS is crucial step in securely transmitting data across the Internet but it does not account for trust worthiness of the source. This where firewalls come in and utilize a combination of packet filters, applications gateways, circuit-level gateways and proxy servers to make certain that packets does not contain malicious content.

Database

A database is a collection of information that is organized so that it can be easily accessed, managed and updated. Most modern websites have complicated operations on data or present dynamic data. This operations should be handled by a separate database. The database is typically implemented in separate server. The database stores and retrieves data, manages updates, provides simultaneous access from web servers, providing security, ensuring the integrity of data, and data backup. To manage data in the database server Database Management Systems (DBMSs) are used.

Web hosting with Database System

Application server

An application server is a server that is specially designed in a way that can run application. It is a server that hosts applications. Its primary job is to enable interactions between end-user clients and server-side application code — often called business logic — to generate and deliver dynamic content, such as transaction results, decision support, or real-time analytics.

Web server is designed to serve web pages and it is not able to run demanding web applications. But an application server ensures the processing power and memory to run theses demanding web applications. It also provides the environment to run specific applications.

References

https://tecadmin.net/authoritative-non-authoritative-dns-server/
https://www.cloudflare.com/learning/dns/what-is-dns/
https://www.guru99.com/tcp-3-way-handshake.html
https://developer.mozilla.org/en-US/docs/Glossary/TCP_handshake
https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL
https://howdns.works/
https://blog.hubspot.com/marketing/parts-url
https://themeisle.com/blog/what-is-a-website-url/
https://www.techdim.com/what-is-application-server/
https://phoenixnap.com/blog/web-server-vs-application-server/
https://cheapsslsecurity.com/blog/what-is-ssl-tls-handshake-understand-the-process-in-just-3-minutes/?utm_source=AboutSSL&utm_medium=couponpage&utm_campaign=cheapcouponpage&utm_content=/how-ssl-certificate-work/
https://www.nginx.com/resources/glossary/load-balancing/
https://www.oreilly.com/library/view/web-database-applications/0596005431/ch01.html

--

--