What Happens When You Search for a URL?

Introduction

Published in

The Startup

7 min readSep 30, 2020

The question mentioned above is one of the most asked questions in the Interview, vivid varieties of people search for this on the Web including students, graduates preparing for interviews, Technology Enthusiasts, etc.

Some are satisfied with the answers they get and others are not, this article here is to explain in detail what happens end-to-end when an URL is searched in the simplest way that is possible.

The key takeaways for all kinds of readers is from just a basic understanding of the Network to its advanced application.

Starting the blog with a good note that the readers have a basic understanding of Computer Networks, to grasp the concepts with clarity and takeaway good insights from the blog.

Beginners Approach towards the concept

In terms of beginner the whole internet is like a black box and searching any detail on the web using URL would fetch his/her desired data.

But the Software Engineers or any professional related to computer science being on the other side of the system(which means able to understand the working of so called black box) must have an end-to-end knowledge to suffice the requirements as well as provide service for the users.

Now the whole world being a Global village every person is one click away, so it is the need of the hour to explore such technology and stay updated as the thought of connecting the whole world itself is a beautiful thought, It should not take much of your time to know what happens in the background.

Basic Terminologies

Every Uniform Resource Allocator (URL) is a set of pipelined paths which fetches our required file from the sever. Similar to every house present in your Area, every system in the Internet has certain kind of Addresses.

There are three types of Addresses associated with a system.

IP Address : Used to identify which network the system belongs.
MAC Address : Used to identify a unique system.
Port Number : Used to identify the process that demands the file.

Each of them having their own responsibilities and set of protocols which they follow.

In Computer Networks every communication happens between two system in a client-server fashion i.e a system demanding set of information and other providing them, following set of predefined protocols(Request-response).

There is also something called Domain Name System(DNS) which is a Distributed, Hierarchical system which stores the IP address for a particular Domain, It is an Application Layer Protocol and is more complicated then explained above, but this is a simple explanation.

The above is a TCP/IP model which is a derivative of OSI Reference model, keeping this in mind we will discuss what happens when we click the URL and which action triggers which part of the layer in the protocol stack.

STEP I

When we are searching for the URL we actually are triggering the Application Layer of the Protocol Stack, which consists of many protocols like HTTP, DNS, FTP, SMTP etc.

Our system looks for the DNS records(IP address majorly) in 4 places namely:

Local Storage or Browser Cache : This is the cache that is held by the browsers which a user is using for better user experience.
OS Cache : If the DNS records are not present in Browser Cache, a system call is made to OS cache. This is the cache provided by the Operating System.
Router Cache : Once the DNS records are not present in the above it means that DNS records are not present in the system memory, Router maintains routing table and cache where we could find the address.
ISP Cache or Gateway : If DNS records are not present in the router they might be present in the network resided by the system. All the information about the network is present in the Default Gateway or Proxy Server which acts like a Cache provided by the ISP.

If the DNS records are not present in any of the above it simply moves from one Network to another searching for the respective Domain, since DNS is a distributed kind of system so our request will be processed soon. DNS uses UDP packets to perform above.

But you may ask how do we know from where to start as there can be billions of URLS?

So the catch here is DNS is a hierarchical system, wherein we have like 200 records in the top hierarchy and also our system is set up with default DNS Address while the time of configuration, We can make use of this to track down the IP address corresponding to the URL.

The above image depicts how an IP address is fetched for a particular Domain, every time it contacts the root server it gives the address of the next domain, in this way the search space is reduced every time.

STEP II

Once it resolves the DNS, it gets the IP address of the network where the particular system lies in which requested information is present.

So now our goal is to reach that system and establish a connection between that and our system which is like a client-server connection.

We follow a packet-switching network hence follow a connection less path to send request packets. Moving of packets from one network to another is done by Bitwise AND of the IP address which we have to that of the subnet mask of each network to determine to which LAN does our system belong.

Once we reach target Network we have to find the system but we don’t have the MAC address so we make use of ARP protocols to get the MAC from IP/We have the respective MAC.

Then as we have the port number we can directly get into that application/process. By doing this our entire socket action is ready.

Actually in our system this entire setting up connection thing is taken care by a different process/thread.

The connection is established using TCP protocols present in the Transport Layer which is reliable which is governing body of the Network layer, Data Link Layer, Physical Layer. So in the step of connection establishment Transport Layer is triggered and also the layers below are triggered because it has to perform all the above operations.

Since it is about connection establishment, all the elements of Transport layer comes into picture, also the Transport layer primitives such as SEND,RECEIVE,CONNECT,LISTEN,BIND also play their parts. To achieve the above we make use of socket programming at end systems.

STEP III

Once the connection is established it is like a full-duplex connection, client and server both can achieve their request and response operations.

Since this operation is happening on the web it is not just request and response its HTTP request and HTTP response as HTTP protocol in the Application layer plays its part.

As part of HTTP request and response the client and server basically send HTML,CSS and JavaScript files, in this way the demands of the client is quenched if the requested file is present or else it is denied.

If the required file/data is available it is displayed on the screen.

As a part of Protocols

Connection Establishment is triggered in Transport Layer.
HTTP is triggered in Network Layer.
Continuous packet switching for connectionless path is triggered in Network Layer.

STEP IV

Once the transfer of data is done the release of connection symmetrically is equally important else would lead to abrupt/undesirable outcomes, hence again the responsibility of successful connection release lies in the hands of Transport Layer via Transport Layer primitives like FIN,ACK.

STEP V

This is the last step and the most important step, it may occur that it is trivial to the user but this is what internally happens to the system once the connection is released. There are various levels of cache/buffers present ranging from ISP cache, local cache, OS cache etc.

The system stores the required records or the URL in the respective cache based on the Algorithm or based on the priority of the request.

This is how the Default Gateway/Proxy servers hold the records if in case anyone from the same network requests for the records it can be fetched quickly.

This is the entire thing happening at the background when we click a URL it may seem naive to a normal user but from a Software Engineers perspective it makes a lot of sense.

References:

The images are taken from google images.
Key insights were taken from Geeks For Geeks.