Web Infrastructure 101

Published in

The Startup

8 min readAug 26, 2019

A straightforward behind-the-scenes of what happens when you type holbertonschool.com and hit Enter.

Not long ago, all the little day-to-day tasks made sense. Everything people did had a purpose and was easy to understand; Killing the deer for food, bringing up the bucket of water from the well to hydrate, cutting wood to create shelter.

Today… well, today things are a bit more complex. Most of the simple tasks we perform, we do without knowing what’s actually going on. I’m sure you can think of some — driving a car, printing a document, charging your phone, making a phone call, and so on.

In this article, we’ll take a look behind the scenes of one of these tasks: browsing the Internet! Have you ever thought of what really happens when you type a site such as https://www.holbertonschool.com and hit ‘Enter’? Let’s get started!

DNS — Domain Name Servers

Think of DNS like one huge phonebook (approximately 333.8 million domain names as of 2018). If you were to try and remember all your contact's phone numbers, you would have an extremely hard time. The same goes for domain names! What’s easier to remember — holbertonschool.com, or 99.84.216.49?

The Internet is composed of millions of IP Addresses (those 4 numbers separated by dots — such as 8.8.8.8). Actually, every single device with an active Internet connection has its own IP address, including your smartphone and computer. Each IP address (website IP addresses) has a corresponding domain name, which is the actual website name that you know. The whole purpose of Domain Name Servers is to make sure you don’t have to remember every single IP address.

DNS is a protocol within the set of standards for how computers exchange data on the internet and on many private networks, known as the TCP/IP protocol suite.

Your computer uses a DNS server to look up the website you’re trying to access; The proper term for this is DNS Name Resolution and it usually involves using your ISP (Internet Service Provider). The process of finding the correct address can be quite complicated and usually goes through several different stages, but all this happens in a matter of milliseconds without you even knowing. We could dig very deep into this process but that would take up the rest of this blog post 😃. If you’re interested in learning more about DNS, I would recommend this lovely cartoon which simplifies everything!

TCP/IP

Transmission Control Protocol (TCP) and Internet Protocol (IP, not to be confused with IP address we spoke of earlier) are very common network protocols that define the way our Internet currently works.

Remember when we used to have TVs with antennas? The local radio towers would broadcast radio signals and the TVs had long antennas capable of receiving this signal, which would be processed and displayed on the screen for our personal entertainment. TCP/IP is a similar concept, just using the internet. The websites we want to see must be transferred from somewhere to our devices.

Now that we have the IP address of the website we’re trying to access (due to DNS), we need to find a way to actually see the content. These websites have to come from somewhere, as it wouldn’t make any sense to store every single YouTube video on your personal laptop storage.

Thus comes the existence of a Web Server! These servers store the content of all the websites we want to access. In fact, the entire Internet works around the client-server definition. When you would like to view a website, you become a client that’s making a request, and the server responds with the requested content. The TCP/IP protocols are the way the content gets transferred from the server to the client and vice versa.

These protocols are built upon four layers: Application Layer, Transport Layer, Internet Layer, Network Access Layer.

The Application Layer is dedicated to dividing up the content that’s going to be sent into smaller segments called packets. The Transport Layer determines the way these packets are going to be sent —either by using TCP (Transfer Control Protocol), or UDP (the protocol used for broadcasting videos or live streams). These two protocols have different ways of sending packets, and are used for different purposes. The Internet Protocol (being the Internet Layer) dictates the logistics of the packets, giving them a destination and a way to get there.

HTTP/HTTPS

Hypertext Transfer Protocol (s standing for secure), is essentially the Application Layer protocol in the TCP/IP standards, and probably the one you use the most (There a few others such as FTP). This protocol is used between a web client (such as your browser) and a web server. It defines how messages are formatted and transmitted, and what actions servers and web clients should take in response to various commands. Have you ever tried accessing a page on a website, but received something that looked like “Error 404! This page doesn’t exist”? 404 is an HTTP response status code, meaning the page you requested does not exist on the server.

HTTPS is a more secure version of HTTP. All data is transferred securely using SSL (Secure Sockets Layer) — every secure website you access (usually has a lock symbol next to the URL), has an SSL certificate confirming that the data being transferred to you is secure.

Why is this even relevant? Well, say you’re trying to order something online and you have to enter your credit card information. HTTPS makes sure all the data you are sending/receiving is impossible to read. It uses encryption algorithms to scramble the packets being transferred, making sure no one has a way to decipher them.

Speaking of security, we should go over the Firewall.

A firewall is a network security device that monitors incoming and outgoing network traffic and permits or blocks data packets based on a set of security rules.

This is pretty self-explanatory. The purpose of a firewall is to create a type of barrier between your private network and incoming traffic from external sources (such as the internet), to reject the possibility of malicious content or viruses. A firewall can be either hardware (physically existing) or software. Many different firewall types exist, a few of them being:

Packet Filtering — The process of examining every incoming and outgoing packet and rejecting/accepting it based on its contents.
Proxy firewalls — Filtering network traffic at the application level. This happens before the transfer of packets. The proxy acts as an intermediary between two end systems. A client makes a request to the firewall (not the server itself), which is then determined to be safe or blocked.

Load Balancer

Everything explained so far sounds pretty straight-forward, but what happens when traffic starts to grow, and I mean REALLY grow? When millions of users at once are making requests for websites like Google and Facebook? One web server won’t be able to handle all these corresponding requests, so they have to use two (or many more) servers. A new problem arises — when a user makes a request, will the content come from web server 1 or web server 2? For this exact reason these types of websites have a Load Balancer (which is actually also a server).

Think of a Load Balancer like a traffic cop — two streets that lead to the same destination, and the cop knows how to efficiently divide the incoming traffic, guiding with his hand which path to take.

A Load Balancers purpose is to distribute incoming traffic across multiple servers, which increases efficiency, reliability, and availability of your site. If one web server crashes all of a sudden, this special server automatically redirects the traffic to the remaining web servers.

The Load Balancer has different algorithms for how it divides up the workload, such as:

Round Robin (most common) — Requests are distributed across the group of servers sequentially. Request 1 is directed to server 1, request 2 to server 2, and so forth.
Least Connections — Before redirecting a request to a server, the Load Balancer computes which server has the least connections, and then sends the request to there.
IP Hash — The IP address of the client is used to determine which server the request will be directed to. For example, all IP addresses from 100.100.100.100–400.400.400.400 will be sent to server 3.

Application Server and Databases

If all we had in the world were Web Servers, our websites would be very, very simple. No users, no logins, no applications, no games — mostly just simple text and pictures.

As websites became more and more complex, different applications had to be involved in order to combine all this together. For example, when David logs on to Facebook and the first page he sees is “Hi David! Welcome back”, how is that possible? Instead of having 2.4 billion personal pages (the number of users on Facebook) stored in a web server, Facebook uses databases to store the users. With the use of application servers, we can access the content in the database, and convert it to static HTML content so that it can be displayed on our devices.

Application Servers let websites become more active and dynamic. Users are able to interact with it by logging in, posting to a forum, and lots more. They are essential for modern websites and are used with most of the websites you interact with.

Here’s a diagram that illustrates the flow of processes happening from the client to the server. In this example, we use Nginx as the web server, and MySQL for the database. Also, we use a monitoring system that we haven’t spoken about — learn more here.

Conclusion

In this article, we barely touched the surface of the web infrastructure. It’s amazing to think that so many processes are occurring at once, in a matter of milliseconds to achieve apparently “basic” functions that most people take for granted in their every day lives.

If you have any questions or improvements, feel free to reach out on Twitter!