Flow overview of a web infrastructure

Roberto Ribeiro
IT student life
Published in
7 min readApr 28, 2021

Being a Full-Stack Software Engineer

I know many people who believe that Google is responsible for showing you a website, they just put in where they want to go and follow the clicks it shows on the screen.

But the real magic is in the web infrastructure. A complex worldwide example of standards and protocols.

Let’s see how it works.

https://imgflip.com/i/pqced

Internet Protocol address (IP address)
An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions: host or network interface identification and location addressing.

Each website has an Internet address, which can also be called a URL and is related to an IP address. Your role is to know where you are hosting the site.

Every time we enter an Internet address, for example, www.holbertonschool.com, the system will look up the IP address related to that holbertonschool.com domain. In other words, it needs to translate the address we know into an IP address.

hosts (file)
The computer file hosts is an operating system file that maps hostnames to IP addresses. It is a plain text file.

Knowing which IP corresponds to a web address is known as a DNS request. The first step is to check the Local Resolver records to see if we have that address registered locally.
The Local Resolver of each machine is commonly known as the “host” file. In the old days, in the early days of the Internet, each computer kept a record in the host file of the web addresses and their corresponding IP.
Today these files are empty. However, the system continues to query them because it follows a communication protocol. We have seen that all queries are performed locally on our machine. If the Local Resolver does not have the IP for that site, it is time to go out to the Internet to look for it.

Domain Name System
The Domain Name System (DNS) is a hierarchical and decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols.

Our browser will use the Internet to query the DNS (Domain Name System) for the IP we want to access.

The DNS query is made depending on what is the Domain Name.
In most cases, the DNS will return the corresponding IP address. However, in some cases in which the DNS does not recognize the consulting domain, it is necessary to resort to the service of the top-level domains (TLDs), who ultimately have organized and registered all the subdomains.

Internet protocol suite
Is the conceptual model and set of communications protocols used in the Internet and similar computer networks. It is commonly known as TCP/IP because the foundational protocols in the suite are the Transmission Control Protocol (TCP) and the Internet Protocol (IP)

Communication between all these stages is made possible by a family of Internet protocols known as TCP/IP.
This series of standards and protocols have been under development since the late 1960s.
In the graphic form, we can visualize to try to understand the complexity of this system if we compare it with the OSI model.

https://en.wikipedia.org/wiki/Internet_protocol_suite

Now that the DNS request returned the IP corresponding to www.holbertonschool.com, we know exactly where the page is hosted and can request information from it.

Note: I queried the address using the network diagnostic software called PING, and now we know that the IP address returned by the DNS is 35.174.46.174.

Before we continue, let’s see how a web infrastructure is designed.

A web server
Is software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web. The main job of a web server is to display website content through storing, processing and delivering webpages to users. Besides HTTP, web servers also support SMTP (Simple Mail Transfer Protocol) and FTP (File Transfer Protocol), used for email, file transfer and storage.

The simplest case is that of a single server.

This server must have a web server, an application server, a database, and application files with a codebase.

We must understand this basic concept to understand how a web structure scales according to its needs.

Application server
Is a server that hosts applications.
Application server frameworks are software frameworks for building application servers. An application server framework provides both facilities to create web applications and a server environment to run them.

The web server always behaves as our doorman of the building, it’s the one who receives the queries and returns the requests.
But by itself it does nothing, it’s necessary a minimum codebase to have a static site.
However, we can give it more functionality if we place an application server.

Database
Is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques

This service will be responsible for analyzing the queries and interacting with the codebase and the database, depending on the request.

This infrastructure is enough to have a user-oriented web, it allows to development of some features and interactivities.

The best-known case for this type of infrastructure is the LAMP SLACK. Its acronym comes from Linux, Apache, MySQL, and PHP, a set of open-source tools that revolutionized web development.

Now that we have an idea of what a simple web infrastructure looks like, let’s move on to analyze our case study.

Load Balancer
Load balancing
refers to the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

It is most likely that the IP we obtained points to the Holberton Load Balancer. When we have a site that handles large amounts of traffic it is not possible to have a single server, because it would be overwhelmed in responding to all queries. The solution is to have several servers with the same information.

To do this, it is necessary to distribute the web traffic to these servers, and that is the function of Load Balancer. How it distributes the requests depends on which balancing algorithm is configured or available.

Following the previous example, the Load Balancer becomes the doorman and interacts with the different servers.

Firewall
In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted network and an untrusted network, such as the Internet.

To end this post, it is necessary to talk about security and data protection, very important mechanisms to take into account in this and future times.

Let’s start with the Firewall, the monitoring system par excellence. With it, we will be able to keep track of every request that is made and also everything that outcoming from our server. It also allows us to configure which are the ports in our server that we leave available and which will be denied. This type of service is installed in the Load Balancer and in each server that is in the web infrastructure.

HTTPS / SSL
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security (TLS) or, formerly, Secure Sockets Layer (SSL). The protocol is therefore also referred to as HTTP over TLS, or HTTP over SSL.

Last but not least is the encryption of communications. We must not only take into account the protection of our data within our servers but also provide guarantees that the entire communication channel is protected against intruders.
HTTPS and SSL services are the most widespread security protocols in the world and can be found almost everywhere on the web.
That padlock that we see next to the Internet address in our browser means that it has the SSL data protection certification.
Formerly these services were used only when queries requiring personal or sensitive information were made, nowadays all information is managed as sensitive and protected at all times.

Today we have only talked about two types of infrastructure but there are many more designs.
The word to better define what type of infrastructure we need is “scalability”, how much your site needs to expand to meet demand without losing security and system integrity.

To be honest, it may seem incredible, but there is not much information on this subject on the Internet.
Hopefully, I can expand much more on this topic in a future post.

For questions or opinions, I’ll read you in the comments.
Bytes!

Resource:

https://runestone.academy/runestone/books/published/webfundamentals/Dynamic/dynintro.html

https://en.wikipedia.org/

https://www.digitalocean.com/community/tutorials

--

--