What happens when you www.holbertonschool.com, and hit enter.
Since the internet is a world-wide network of computers (a network of networks), each computer must know how to ‘communicate’ with other computers. To do so, each computer connected to the internet carries it’s own unique address in the form XXX.XXX.XXX.XXX where XXX is a number between 0 and 255 (255 being the maximum value representable by a unsigned 8-bit byte). This is known as an IP (Internet Protocol) address.
Because IP addresses are not particularly descriptive and difficult to memorize, the internet allows a user to specify a computer by a domain name known as a domain, rather than a number string.
A computer may only have one IP address at any given time. When an IP address is mapped to a domain name, it is called an ANAME record. Nevertheless, a computer may have several second-level domain names that serve as aliases known as CNAME records. For example, www.holbertonschool.com can have a CNAME record stating that www.holbertonschool.com points to holbertonsf.com. The use of host name aliases makes it easier for service providers to migrate services to new machines without interrupting service.
The network of computers that comprises the internet is done so through ISPs (Internet Service Providers) such as AT&T, Comcast, Time Warner, Verizon, etc. When you connect to an ISP, you become a member of their network, which in turn, is connected to other ISPs.
The .com part is known as the top-level domain. Other common top-level domains include .net, .org, .biz, .gov, etc.
Domain Name System
So, how do we get the proper IP address when we enter www.holbertonschool.com in our browser? This is where DNS (Domain Name System) services come into play. DNS is a comprehensive translation system used to assign user-friendly domain names to unique IP addresses. Each ISP gets a large number of IP addresses from the Internet Assigned Numbers Authority. One of the major function of DNS includes locating IP addresses to specific site names and then storing and maintaining this data. A second function is to distribute the DNS over a vast network of connections.
Once a user enters www.holbertonschool.com in their browser, the browser will check its cache of DNS records, which are kept for some fixed duration. This is the first place to resolve DNS queries. Next, the OS will locally consult the /etc/hosts file (on Unix systems) and if a record for the domain is not found, the router will check its cache.
If a record for the domain name is not found in the browser’s, router’s, or OS’ cache, the ISP queries a DNS resolver through a ‘recursive query’, meaning that the resolver must complete the recursion and the result must be either an IP address or an error.
Root servers
The ISP’s resolver starts by querying one of the root DNS servers for the IP address of www.holbertonschool.com. There are 13 root server clusters in over 380 locations around the world. They are managed by 12 different organizations that report to the IANA (Internet Assigned Numbers Authority). These root servers are responsible for handling generic top level domains (gTLD’s) such as “.com”, “.net”, “.org”.
When www.holbertonschool.com was registered, the registration service (let’s say that it’s Gandhi) sent a request to the root name servers that handle ‘.com’ that it is responsible for that domain. When someone on the internet tries to go to holbertonschool.com and their ISPs DNS servers do not know about it, the ISP’s DNS servers looks at its root hints to find which root name servers to talk to. The ISP DNS server asks the “.com” root server where holbertonschool.com is. The root server will then point to Gandhi and request the DNS records for holbertonschool.com. Gandhi will either respond with the appropriate response (if it is the authoritative DNS server responsible for handling all queries), or otherwise answer with a direction to the NS records which are used to delegate a subdomain to a set of name servers.
The protocol stack
Each computer needs a protocol to communicate on the internet. These protocol’s are built into a computer’s OS (Linux, Windows, etc.) The protocol used on the internet is known as TCP/IP protocol. TCP/IP stack can be abstracted to the following:
- Application Protocols Layer — specific to applications such as email, FTP, WWW, etc.
- Transmission Control Protocol Layer — TCP directs packets to a specific application on a computer using a port number
- Internet Protocol Layer — IP directs packets to a specific computer using an IP address.
- Hardware Layer — converts binary data contained in packets retried to network signals and back (network card, etc).
Time to initiate connection!
www.holbertonschool.com’s IP address is 54.192.119.70. We can verify this by pinging the website in terminal (ping www.holbertonschool.com).
So now that we are connected to the internet and have a unique IP address, it’s time to initiate a request to the holbertonschool.com web server where the website’s files are located. Let’s say you’re connected to the internet through your ISP — the message must be translated from binary code to electronic signals, transmitted over the internet and then translated back into binary code.
Let’s balance that load
Many websites have what are called load balancers, which are devices (hardware or software) that act to distribute traffic across a number of servers. In the case of holbertonschool.com’s IP address at 54.192.119.70. When we connect to that IP address, we are actually connecting to the IP address of the load balancer.
The load balancer will handle the request and send the user to a more optimal server to be connected. There are several different methods for distributing client requests across a group of servers; one of which is round-robin load balancing. The round-robin method simply goes down a list of servers in a group and forwards a client request to each server in turn.
HTTPS, SSL and Security
Because www.holbertonschool.com operates via the secure version of HTTP known as HTTPS (Hyper Text Transfer Protocol Secure), it means that all communication between a browser and the website will be encrypted. HTTPS is widely used today and is important in protecting highly confidential online transactions like banking and shopping forms.
www.holbertonschool.com uses the SSL (Secure Socket Layer) protocol to encrypt communication. SSL uses what is known as an asymmetric Public Key Infrastructure (PKI) system. An asymmetric system uses two separate keys to encrypt communications — a public key and a private key. Content encrypted with a public key must be decrypted using a private key and vice-versa. A private key is protected and only, in theory, accessible to the owner of the private key.
The SSL handshake
When a user requests an HTTPS connection to a webpage, the web server will initially send its SSL certificate to the user’s browser. The certificate contains the public key required to begin the session. Now that the browser has a copy of the public key, the browser and web server will initiate an SSL handshake — this involves the generation of shared private passkeys to establish a unique secure connection.
With HTTPS, all communication is securely encrypted — this means that if somebody managed to break into the connection, they would not be able to decrypt any of the data which passes between the browser and the website.
More security through firewalls
As holbertonschool.com’s web servers are connecting to a network that has internet connection, there are many ways it can be compromised. Therefore, it is necessary to take extra precautions to secure data. A web application firewall is another layer in securing a site from online threats.
A firewall is a system that provides network security by filtering incoming and outgoing network traffic based on a set of user-defined rules. Generally, the role of a firewall is to mitigate unwanted network communications while allowing all legitimate communication to flow freely.
Packet filtering, or stateless, firewalls work by inspecting individual packets in isolation. As such, they are unaware of connection state and can only allow or deny packets based on individual packet headers.
Stateful firewalls are able to determine the connection state of packets, which makes them much more flexible than stateless firewalls. They work by collecting related packets until the connection state can be determined before any firewall rules are applied to the traffic.
Application firewalls go one step further by analyzing the data being transmitted, which allows network traffic to be matched against firewall rules that are specific to individual services or applications. These are also known as proxy-based firewalls.
GET Request
The browser sends a GET request to the web server according to the specification of HTTP (Hypertext Transfer Protocol). A web server program typically listens on port 80 and receives some meta information in the form of headers that the browser sends in its GET request. The ’User-Agent’ header specifies the browser properties, ‘Accept-Encoding’ header specifies the type of response the server will accept, and the ‘Connection’ header tells the server to keep the TCP connection established. The GET request also contains what are called Cookies, which is information stored at the client’s side, containing previous browsing session information for the same website.
Web server
HTTP requests are handled by a special software known as a web server (Nginx, Apache, etc.) Let’s say that our web server uses Nginx. A web server is a computer system that processes requests via HTTP. Its primary function is to store, process and deliver web pages to clients. Pages delivered are most often, HTML documents which include images, style sheets and script in addition to text content.
When a user goes onto www.holbertonschool.com, the web server passes the request to a request handler, a program written to handle services — common languages include PHP, Ruby, ASP.NET, etc.
The moment the GET request is received, Nginx (our web server) prepares the environment to execute the index.php file. This php program will generate an HTML response and sends it back to the browser via TCP/IP according to HTTP guidelines.
Receiving the HTTP response and displaying HTML
The browser receives the HTTP response and displays the HTML content to the user. Rendering of HTML content is conducted in phases. The browser first renders an HTML skeleton and then sends multiple GET requests to fetch other hyper linked content such as images, javascript code, css files. To optimize efficiency, a browser will cache these static files so that they won’t have to be fetched every time a user logs onto a site.
User interaction and dynamic content
Suppose a visitor to holbertonschool.com was interested in applying and creates an account to start the application process. Now the HTTP request is called a POST request. Along with the GET request, the browser will send the form data to the server for processing (username and password) with a GET request. A POST request submits data to be processed to a specified resource. As opposed to GET requests, POST requests are not cached, never remain in browser history and have no restrictions on data length.
The POST request
The POST request makes a request to the web server which states exactly what kind of content it is sending along with header files. The web server passes this data to the application server to be processed using programming logic in the codebase and, if needed, writes to, edits, or deletes records in the database (let’s assume its MySQL).
The application server will then send back the proper HTML content back to the web server and the load balancer returns the HTML packet back to the client’s browser.
