What happens when I type something in the browser’s address bar?

Published in

Expedia Group Technology

11 min readApr 21, 2019

A recurrent question during interviews is to ask the interviewee what happens when you type in the browser address bar “google.com“. More often than not, the person won’t know the answer or give a very incomplete answer (nervousness might take a part here).

The answer to the question is not an easy one, mostly because if you want to give the most complete possible response, you can go on forever talking about it.

What this post intends is to achieve a general approach to the question, with some resources where each item is described with more detail. In my opinion it should be more than enough to answer the question with distinction.

1. A user types in the browser address bar

When a user types something on the keyboard, the signals will be sent down to the OS system, that will take charge in sending an event to the browser. By receiving that event, the browser will initiate the auto-complete functions. For instance, if a user types "go” the browser address bar will most probably suggest "google.com” before he ends typing it. The suggestions provided by the auto-complete functions are based on your search history, bookmarks, cookies and popular searches on the web. The user may also observe suggestions that are presented on the dropdown below the address bar. The dropdown opens once the suggestion algorithms finds some matches with the typed string (e.g. on desktop, Chrome will show up to 6 suggestions).

2. Is the input a search term or a URL?

The input can either be a search term or a url. If what it is typed in the address bar doesn’t include a protocol ( http:// or https://) or a valid domain name, the browser interprets it as a search term and will feed it to the browser’s default search engine. On the other hand, if the browser decides it’s a valid URL it will check its schema and append http:// to the beginning of the input. A request then created assuming the default port 80, GET method and no basic auth.

2.1. HSTS (HTTP String Transport Security)

My last statement is not completely true, in fact the browser will check its preloaded HSTS list (HTTP String Transport Security, see: RFC 6797).

As you know http:// is insecure. An attacker can hijack that connection, manipulate it, to accomplish a whole field of malicious hacks. A user can be redirected to g00gle.com where it could be at the attackers mercy, who can for example, intercept passwords.

HSTS is a security standard that provides a mechanist for web sites to declare themselves accessible only via secure connections. If a website is in the HSTS list, the browser will send its request via HTTPS instead of HTTP, using port 443. Otherwise the request is send via HTTP.

Of course a website can still use the HSTS policy without being in the HSTS list. The browser will make a first HTTP request, receiving subsequently a response requesting the browser to only send HTTPS requests. This HTTP request can still leave the user exposed to a downgrade attack. This is the reason why the HSTS list is nowadays included in the browser.

You can check if a site is included in the HSTS list at https://hstspreload.org — google.com is not included 😱.

Resources:

Mozilla wiki about the HSTS preloaded list: Firefox HSTS list
Safari about protecting against HSTS abuse

3. DNS Lookup

A browser can’t open any webpage without knowing its IP address. (Do you want to know your ip address 😎? See: https://whatismyipaddress.com/)

The url IP address (example: google.com) can be found relying on a process called DNS lookup. DNS (Domain Name System) is a database that maintains the unique relationships between the name of the url and the IP address it links to. For example www.google.com can be reached by typing http://172.217.23.14 in your address bar.

A browser will need to perform the following steps in the respective order to achieve that objective. If the current check fails it will process to the next step.

Browser checks if the domain is in the local caches

It can be in the browser cache if it was recently visited (you can check Chrome’s DNS Cache at chrome://net-internals/#dns)
DNS Cache: based on the TTL settings, an IP might exist in the DNS cache from any previous visits. (good resource: ttl settings)
Hosts File: an associated IP could be defined in the hosts file in the user’s machine. The browser calls the gethostbyname library to do the lookup in this step (interesting resource: modify hosts file on windows)

2. Browser makes a request to the DNS server configured on your network system (DNS settings), which typically refers to your local router or your Internet Service Provider DNS recursive servers cache. The IP address may have been cached there from the previous visits of any other user that requested the same website.

3. The request will proceed to the Root DNS servers. Each top level domain (e.g. .com, .pt, .es) has its own name servers called Top Level DNS servers. Their job is to redirect our request to the name servers correspondent to our requested top-level domain (TLD).

4. The request is routed to the relevant Top Level DNS server for the correspondent domain (e.g. .com if we want to find google.com ). These servers will direct the request to the where it can find the information it is looking for, more specifically the authoritative name servers.

5. The Authoritative DNS Servers contain the information about the domain. When the request is passed along, it is asked for a record of the domain name, which contains the IP address of the server where the website is hosted on.

Image credits: https://medium.com/@kamranahmedse/dns-in-one-picture-d7f4783db06a

Resources:

how your browser finds websites
comics on how DNS works (BEST 😍)
Interesting resource to read about the Address Resolution Protocol: ARP

4. The Browser opens a TCP socket connection with the server

Allright, now that the browser received the destination server IP address, it takes the port number of the correspondent protocol and makes a call to the system library function socket and requests a TCP socket stream: AF_INET/AF_INET6 and SOCK_STREAM.

The request is then passed to the Transport Layer of the OSI Model (4th layer), where TCP segments are created and the destination port is added to the TCP header. The transport layer is responsible for end-to-end communication over a network, quality and reliability.

The request is sent to the Network Layer (3rd layer of the OSI Model), where the destination IP address and the device IP that is doing the request are added to a segment to form a packet.

The Network layer selects and manages the data transfer between nodes in a network. It is responsible for the logical connection setup, data forwarding, routing and delivery error reporting.

Next, a packet is delivered to the Data Link Layer (2nd layer), where a frame header is added to the packet including the MAC addresses of the network interface and the local router.

After this step. a packet is ready to be transmitted through the physical layer, going through all sorts of nodes until it reaches the destination server.

4.1. Establish a connection

Before any data transmission begins, the client and the server need to exchange a set of parameters. To bootstrap the connection, the server must have a passive open connection binded to a port that listens to connections. The connection is accomplished between both is accomplished through a process called three way handshake (SYN, SYN/ACK, ACK).

All packets are sent and received using the TCP connection flow illustrated in the image below:

credit to: https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.12/gtps7/s5tcpcf.html

5. Performing the TLS Handshake

The Transport Layer Security (TLS) Handshake Protocol is a cryptographic protocol that is responsible for the authentication and key exchange, to establish or resume secure sessions between a server and an application, such as a browser. The HTTPS protocol is no more than HTTP over TLS. The process on how the TLS protocol establishes a session is illustrated in the following image:

image credit: https://hpbn.co/transport-layer-security-tls/

Here’s the output of the TLS Handshake performed by the command:

curl GET “https://www.google.com" -v

Resources:

Nice article about TLS and TCP: nuts-and-bolts
Microsoft: TLS Handshake Protocol
Networking 101: TLS

6. The Browser sends an HTTP request to the server

The Hypertext Transport Protocol (HTTP) is the internet’s communications backbone.

As soon as the connection is established we can start transferring data between a server and a client. The browser will send a GET request asking for a webpage, say google.com.

The Headers will contain information such as the method, path, the browser identification (User-agent), etc. Here’s an example of the headers set on a GET request done in Chromium to google.com:

The server will respond with a message stating the status code, which will be 200 if everything is right. If the browser includes sufficient information, the web server may determine that the version of the file cached has been unmodified since the last retrieval. In that case, the server will respond a status code 304 Not Modified, and no payload. The HTML (or a pdf, image etc) will be retrieved from the browser’s cache. Here’s an example of the response headers retrieved from the previous GET to google.com, stating a 200 status code:

If you’ve never seen HTTP requests & responses in your browser, open your dev tools (inspector) in Chrome or any other browser and navigate to the network tab.

7. How the server handlers the request

A HTTPD (HTTP Daemon) is a piece of software that runs in the web server background, listening for clients HTTP requests, and handling those requests and responses on the server side. It normally is Apache or nginx for linux, IIS for Windows and could be written in a multitude of programming languages like: PHP, Ruby, Node, etc.

The server will break down the request by request method (either GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, or TRACE), domain and path. If we type a URL in the address bar it will always be a GET. After reading the request, it’s headers and cookies to analyse what is being requested it can also rewrite the request if needed (mod_rewrite Apache module ).

The server will then assemble the content for the payload response that corresponds with the request. In the case of google.com it will fallback to the index file in the root / path. It will then parse the file and stream the output to the client.

7.1. The server response

The server response to the google.com url GET request is a HTML document. The response includes some headers like the status code, cache of the page, compression type, cookies, etc.

Server response (headers and content) for google.com

The response status code is important, since it tells the client the status of the response. Any kind of 2xx will indicate a success, 3xx indicates the client will be redirected, 4xx indicate errors on the client side and 5xx errors indicate errors on the server side.

8. How a client (browser) handles the response

Now that the browser received the response from the server, the HTML, CSS and JS will be parsed. The browser will then check the HTML tags and send out any GET requests for external resources. These can be assets such as CSS stylesheets, JavaScript files, images, etc. After the parsing is done the rendering process will initiate, allowing the HTML to be shown in the browser.

8.1. HTML Parsing

The HTML parsing process starts with a stream of Unicode characters being passed from the Network layer to the tokenisation stage of the parsing model. The data is usually passed in 8kB chunks. The responsibility of the parsing model is to construct the DOM (Document Object Model), a tree of DOM elements and node attributes that represent the web page document.

You can further read about the parsing algorithm at W3C parsing HTML documents.

As a note, keep in mind the parsing of an HTML document is not trivial. Because of the nature of the task at hand the language is very permissive. For example, the html could contain an opening <p> tag element without a closing tag </p> . This invalid syntax will be fixed and the browser will simply go on with its tasks. The process is also reentrant, meaning while the tree construction stage is handling a token, the tokeniser might be resumed and cause more tokens to be emitted and processed before the initial token process has ended. This behaviour can be triggered by dynamic code such as a document.write() call.

<script>  
   document.write('<p>'); 
</script>

8.2. Fetching resources

The browser will start fetching external resources as soon as it hits a recognisable tag such as <link>. The whereabouts of those resources can be hinted to the browser by including <link> tag inside the <head> tag, with a rel attribute set as dns-prefetch, preconnect, prefetch or prerender, a technique for optimising webpages (resource hints).

8.3. CSS

When the browser finds a CSS stylesheet either embedded or external, it parses it using the CSS lexical and syntax grammar, that is going to be used to style webpages layouts and paints. Each CSS file is parsed into a Stylesheet object with a structure called CSSOM (CSS Object Model).

The CSSOM construct, just like the DOM construction is considered render-blocking.

8.4. Render Tree

CSSOM and DOM constructs are needed to create a render tree that will contain all the information for the browser to create pixels in a page. The following image exposes the way the render tree is constructed.

render tree construction, credits: https://developers.google.com/web/fundamentals/performance/critical-rendering-path/render-tree-construction

8.5. JavaScript

When the browser finishes the parsing, it marks the document as interactive and starts parsing JavaScript scripts that are set as deferred. Scripts can also be set as async, in that case the HTML parsing is paused until the JS script is parsed and executed.

If a <script> tag is found, the DOM construction will be paused. However the browser will wait for the CSSOM construction to end. The reason is the same as described before, JS execution can modify the DOM and access or modify the CSSOM.

9. Rendering

Now that the render tree is constructed the browser can proceed to the rendering phase. The render tree is used to create the layout of visible elements in a page.

In the layout stage, the browser will calculate the dimensions of each element and then their position in the viewport. It will begin at the root of the render tree and traverses it.

When the layout stage is complete, the paint stage will begin. For this process the browser will employ the capacity of GPUs or the CPU, to convert the render tree to pixels on the screen.

9.1. Critical Rendering Path

You might have heard or know of this concept before if you work in the field, and guess what, I just described the critical rendering path in the two last points 🙃 (8 and 9).

The critical rendering path is the minimum steps that the browser has to take from the moment it receives the first byte of HTML to the moment that it renders pixels on the screen for the first time.

Writing this blog post as been a learning process for me, I hope that those that reach the end will learn something too.

Resources:

What happens when I type something in the browser’s address bar?

1. A user types in the browser address bar

2. Is the input a search term or a URL?

2.1. HSTS (HTTP String Transport Security)

3. DNS Lookup

4. The Browser opens a TCP socket connection with the server

4.1. Establish a connection

5. Performing the TLS Handshake

6. The Browser sends an HTTP request to the server

7. How the server handlers the request

7.1. The server response

8. How a client (browser) handles the response

8.1. HTML Parsing

8.2. Fetching resources

8.3. CSS

8.4. Render Tree

8.5. JavaScript

9. Rendering

9.1. Critical Rendering Path

Written by Hugo Queirós