What happens when…

you type holbertonschool.com in your browser and press Enter

This short essay is mostly for me to state what I have learned so far at Holberton School in my journey into the networks.

1- DNS resolution

I type holbertonschool.com to reach the website, as it is easier for me, but the Internet relies on IP addresses (numbers). So we need a mapping between the 2, that is the role of the Domain Name System. How does the browser go from one to the other ?

0. The browser looks in its cache if it has the IP corresponding to the domain name in store. If yes, it is done and it sends the request with that IP.

  1. The browser requests the OS, is it in memory ? If not, got to the next step.
  2. The browser turns to the resolver server, in my case, my ISP provider owns servers that are mostly<IP>:<domain name> directories. Not there ?
  3. Go to the root server. The root server will then direct the request to the Top Level Domain server in charge of the .com extension part of the address.
  4. At this point, it will be further redirected to the Authoritative Name Servers or Domain Registrars. Those servers are in charge of the holbertonschool part of the address. They hold the DNS records that match a domain servers IP addresses to the domain name. Finally, we got the information, the resolver, the computer and the browser will save it for future use depending on what the domain administrators decided.

dig is a bash command for DNS lookup.

$dig holbertonschool.com
; <<>> DiG 9.9.5-3ubuntu0.13-Ubuntu <<>> holbertonschool.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62644
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1
; EDNS: version: 0, flags:; udp: 4096
;holbertonschool.com. IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
holbertonschool.com. 60 IN A
;; Query time: 33 msec
;; WHEN: Tue Apr 18 03:35:48 UTC 2017
;; MSG SIZE rcvd: 176

2- Connecting to a website

Now we have the IP address, the browser can send a request.

Typing holbertonschool.com in a browser nowadays is usually translated to http://holbertonschool.com (or even https://holbertonschool.com).

This request follows the HTTP protocol on the application layer. This means it will send the requests to the resolved IP on port 80 (default).

This request must respect the Internet Protocol (IP) and Transmission Control Protocol (TCP). The request I send is divided into packets. IP represents the internet layer.It is responsible for labeling those packets and to tag them with my IP and holbertonschool.com IP. TCP is the transport layer, it is responsible of the reliability of the transmission, via a 3-way handshake.

traceroute is a bash program to see the steps going from my place to the server IP.

$ traceroute holbertonschool.com
traceroute to holbertonschool.com (, 30 hops max, 60 byte packets
1 ( 0.259 ms 0.096 ms 0.194 ms
2 homeportal (***.***.***.***) 5.432 ms 8.349 ms 10.254 ms
3 108-68-104-2.lightspeed.sntcca.sbcglobal.net ( 33.019 ms 39.960 ms 46.038 ms
4 ( 41.547 ms 41.823 ms 41.183 ms
5 ( 50.240 ms 62.409 ms ( 47.339 ms
6 ( 49.758 ms 25.600 ms 33.788 ms
7 ( 33.875 ms 29.883 ms 29.950 ms
8 ( 45.516 ms * *
9 * * *
10 ( 36.145 ms 40.323 ms 41.153 ms
11 * * *
12 * * *
13 * * *
14 server-52-84-239-88.sfo5.r.cloudfront.net ( 34.302 ms 35.666 ms 24.780 ms

3- On the host server

Following the HTTP address, I will reach the port 80. Hopefully behind the IP address is a web-server, which means the machine is always listening to specific ports.

At this point I may hit a firewall. The role of the firewall is to filter ingoing and outgoing IPs, ports and protocols. Assuming I look legit I can go on.

Then, Holberton School adds another layer of security and requires end to end encryption when it connects and so uses HTTPS. They redirect port 80 (HTTP) to port 443 (HTTPS) to connect my computer. Using HTTPS means there is another handshake where some keys are exchanged to encrypt and decrypt the data sent between the browser and the server. It is done through the Secure Sockets Layer (SSL) or Transport Layer Security (TLS)protocol.

To look at that step in more details, we can use curl.

$ curl -sIL holbertonschool.com
HTTP/1.1 301 Moved Permanently
Content-Length: 0
Connection: keep-alive
Date: Mon, 17 Apr 2017 09:49:54 GMT
Location: http://www.holbertonschool.com/
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Content-Type: text/html; charset=utf-8
Date: Tue, 18 Apr 2017 04:01:52 GMT
Location: https://www.holbertonschool.com/
HTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Content-Length: 37570
Content-Type: text/html; charset=utf-8
Date: Tue, 18 Apr 2017 04:01:52 GMT
ETag: W/"9657c48c48addb7d4c196da6865fcf0e"
Server: nginx/1.6.2

In the code snippet above we see actually 2 redirections, one from http://holbertonschool.com to the subdomain http://www.holbertonschool.com and then an HTTPS redirect to https://www.holbertonschool.com.

This defines a simple connexion. However, the IP address I have been given may one of a load balancer. This kind of server does not treat demands directly, it forwards them to a pool of server, using an algorithm to decide which server will actually serve my request. And in that case, the handshake and encryption exchange can be done with that server, and not the load balancer.

This remains a high level approach which hopefully is correct. I note in particular that typing a simple host name in my browser can generate 2 requests, a first one to get an IP, and a second to reach that IP.

Resources I want to keep: