What happens when you type
https://www.holbertonschool.com in your browser?
In the era of immediacy where we are used to getting everything almost in real-time, surfing the internet is not the exception, we do it all the time and it is part of our lives, many of us find in the internet tools that help us in anything we do daily and being something so common, We do it basically because it is inherent to us and we never question what happens behind the simple act of typing in the search bar what we commonly know as “the address of a web page”, “the link”, “the name” of this one or what is really the URL (uniform resource locator) and then press enter.
This and other concepts will be explained a little later in detail in this article to really understand some of the magic behind it.
How does the web stack work on top of the internet?
By typing in our search bar the URL https://www.holbertonschool.com, what our browser does is send a request to the servers that are linked to this website and display the content that is hosted on them, many of you will be wondering what a server is.
A server as its name suggests, in technology, is a software or a device that “serves” another device, software or in this case the browser that would be the client and this process by which client-server “communicate” is called service, this service provides storage and management of data required and requested by the client, There can be many servers for a client and many clients for a server, depending on the case because there are different types of servers (database servers, files, mail, web, games, etc), these servers are also physical machines in which you can run specific software, the latter are gathered in sites called server farms or data centers.
Each of these servers is associated with a particular URL, this IP address is of great importance because it is the identification of where the server is, which allows you to interact with the browser and thus the browser “understands” the location of this to meet customer requests and does so through the request DNS (Domain Name System), this URL is an alias, easy to remember and read for the user to do their searches on the Internet in a simple way, but your computer does not understand this URL, so the DNS comes into action, because it translates the URL to the associated IP address to find the server that contains the data being requested, if not found is when we redirect the famous page “404 not found”, which is a standard response protocol that indicates that there was no communication with the server that is requesting the data, this DNS system helps us not have to memorize the IP address of each web page you want to visit, only knowing the URL of the page can do.
The URL has a syntax, its elements perform a specific action to show us what we finally see as the content of a web page:
- Internet Protocol
- Domain name
- Last file path
- Internet Protocol
These are some rules to be able to access any website, without it you could not access the data through the internet, as I mentioned before, when appearing in our search a 404 page, this is done through the internet protocols, For example, there is the Hypertext Transfer Protocol (HTTP) and its encrypted variant HTTPS, a secure hypertext transfer protocol, by which data from a web page is transferred from the server to our browser, as we have explained in this article, this is the basis of the Internet or the WWW.
Although these protocols are similar, they both define two different procedures for sending and retrieving data from a server. HTTP sends and receives data in plain text, which is not safe because it has no security to prevent the theft of any data, but HTTPS (HyperText Transfer Protocol Secured) if you have security, with this protocol we have the guarantee of having our data secured because they are protected with encryption algorithms, also has certificates that ensure that the communication that is established is the correct user and prevent any impostor intercept any type of transaction (eg commercial transactions).
- The domain name or Hostname
The hostname or domain identifies the host where the resource is located (IP), is assigned to a host computer, this comes from the combination of the local name of the host on its main domain name, in this case:
www.holbertonschool.com is the name of the host “www”
holbertonschool.com corresponds to the domain.
Domain names are used in various network contexts and for application-specific naming and addressing purposes, as we know they are formed by the rules of the domain name system (DNS) and as the Internet is based on IP addresses and not domain names, this requires that each web server has its own DNS to correctly translate domain names to IP addresses, “Any name registered in the DNS is a domain name. Domain names are organized on subordinate levels (subdomains) of the DNS root domain, which has no name. The set of top-level domain names are top-level domains (TLDs), including generic top-level domains (gTLDs), such as the prominent domains .com, .info, .net, .edu and .org, and the country code top-level domains (ccTLDs). Below these top-level domains in the DNS hierarchy are the second- and third-level domain names that are generally open for reservation by end-users who wish to connect local area networks to the Internet, create other publicly accessible Internet resources, or run websites.”
- Path of the file
The file path refers to the exact location of a resource within the web page we are requesting.
Often analogous to the underlying file structure of the web page, this path is found after the hostname, separated by a slash “/”, it may also have one of the existing file extensions (.pdf, .jpg, .gif, .png, etc.), although in the present day to give a more uniform and elegant image to web pages these file paths may be configured not to appear directly on the page you are requesting.
Already knowing all this, when entering a URL in our search bar, the URL is redirected to a router, which makes the search for the IP address that corresponds to the web page we want to see, this requires a configuration of the DNS server to use, and from the router or operating system can be done, Although the Internet provider that we have in our house or company has a standard configuration for this task, for the DNS to work properly, it requires time because these IP addresses that we have already visited before are stored in the DNS cache of the operating system, with all this the DNS traffic is much faster, we can easily and fast access the pages that we have already visited before. The router is the intermediary between the local network and the Internet, it requests the data from the Internet and distributes it through the devices that are in the local network (computers, cell phones, etc.), it is necessary to mention the function of the router because it is important to know that although each device connected to the network has communication with all the devices around it through the local IP, they share the public IP address of the router, in modern connections like ipv46 each device in the local network has a public IP address.
The router also intervenes in the process of requesting data to display our website, because when it finds the IP address we want to see, requests data and through HTTP protocol explained above, delivers the requested data and additionally the router communicates its own IP address and provides all the information it contains itself to display the website, the webserver evaluates all this and issues a positive or negative response, that is, displays the website, redirects it or simply displays the page 404 of which we have already spoken.
After this process of requesting data through the router and the URL, we notice that the protocol of that URL begins with HTTPS instead of HTTP, this means as previously mentioned that this browser has a secure scheme to protect our data on the Internet, this protocol is known as SSL (Secure Sockets Layer), is a type of technology that allows maintaining safe the connection to the Internet, protects the data transferred in the communication “client-server” for example, to avoid that they are stolen by computer criminals, making them impossible to read, using encrypted algorithms that encode these data.
This technology is obsolete and was completely replaced by the TLS protocol (transport layer security) that does exactly the same, is an updated version of SSL and much more secure, you can identify in our browser because there is a small icon of a padlock next to the URL, where you can see the SSL certificate when you click, (although many are familiar with SSL, the right thing would be to use the term TLS).
Currently, these certificates can be purchased, they are the credentials of the owner of the website and allow a secure connection, they can find the following information:
- The name of the owner of the certificate
- The serial number of the certificate and the expiry date
- A copy of the certificate holder’s public key
- The digital signature of the authority issuing the certificate
There are three special cases where SSL/TLS becomes important: when you need to authenticate the identity of the server to verify that you are who you say you are, to give reliability to many online transactions and when you need to comply with industry standards, such as in the financial sector because they need a certain degree of security for their transactions and one of the requirements they ask for is to have the SSL/TLS certificate. These protocols are very useful and can be used in almost any device, it is a very versatile option of computer security, in addition to using these certificates to save much time and investment that is required to configure them.
In order for all the magic behind writing a URL in our browser to work, as we have already seen, it works through certain protocols, among which it is important to also add the TCP/IP protocols.
The Internet Protocol Suite is the conceptual model and set of communications protocols used on the Internet and similar computer networks. It is commonly referred to as TCP/IP because the fundamental protocols of the suite are the Transmission Control Protocol (TCP) and the Internet Protocol (IP)
In short, besides being many of the protocols used by the Internet, the TCP/IP (Transmission Control Protocol/Internet Protocol) allows the establishment of a connection and data exchange between the “client-server” relationship, provides reliable transport of data and takes the data to other machines on the network, this protocol defines the process from the time data is sent through packets until they are received, this is known as layers that work hierarchically and communicate between if in the same way, are 4:
Link level or network access, Network or Internet level, Transport level, Application level, these layers and the TCP/IP protocol are very important because they allow the sent data to reach its destination without modifications that can alter any process of the protocol.
This protocol has several uses such as being able to remotely log in through the network, to transfer files, send emails and access files on a host server.
It should be noted that the main version of IP used on the Internet is version 4 of the Internet Protocol (IPv4), but this has some restrictions on the number of addresses possible, so it has developed a newer and more modern model that is the IPv6 that has at its disposal more addresses available.
One of the most important components in this process are the web servers, they are a server software or hardware that executes this software to attend the requests (store, process, deliver) of the clients in the WWW (Web Wide Web), this can contain one or more web sites, it processes the requests that enter through the different protocols already mentioned before, these web pages contain documents in HTML that include text, images, videos, style sheets, and scripts, they can be static or dimanic, the first ones send to the server their files the way they are hosted in the browser, they are very flat and the second ones respectively are based on the first ones with additional elements like an application server and a database, it is dynamic because their data is updated before sending it to the browser according to the interaction that the web page has had, for example, so that we can see the final content an HTML template is basically filled together with the content of the databases, this makes them more efficient, interactive and easy to deliver and manage their content.
Some of the most popular web servers are nginx and apache.
Another component that complements the web servers are the application servers, these are the base of any dynamic web server, with this it is possible to interact with the site, manage the information that is uploaded to it and create any component that interacts between the user and the web page, basically, it manages the code behind to give functionality to our web site, in order to run the programs with which they are made, their purpose requires hardware specifications such as CPU and RAM (to run these applications in real-time) and in the software the operating system with which the server can run depending on the capacity of all the elements that make up our website and compatibility between them.
Therefore the application server can process and analyze the data and then return the result of this process to the web server that generates what we see in the browser.
Another complement is the databases, which are an organized, structured and efficient set to store data on a computer and in this case all the data we store for use on our website as the information received from the user, with this you can access, update, delete and add information in an easy and effective.
A database is controlled by the DBMS (DataBase Management System), the data and the DMBS with the other components that are associated with them as in this case the application server and the web service, are a database in total. These are organized by rows and columns for efficient processing and querying, most databases are used with structured query languages (SQL) to interact with their information, their configuration can be something called “master-slave”, this means that the master is responsible for informing the slave devices about updates in their records so they are all synchronized.
Databases have several types and the main ones are relational and non-relational.
- Relational databases can be seen as a series of organized tables that show the information by rows and columns, MySQL is a very popular relational database.
- Non-relational databases can display information without a structured schema unlike relational databases, they are known as NoSQL databases.
For the efficiency of all these processes, there is something elemental that we must take into account and it is the load balancer, because it distributes the workload, in this case, the requests from the cluster to multiple servers, it divides between several servers this load, which can be configured by means of some algorithms that determine how this workload is going to be balanced, This makes the traffic of requests much more efficient and increases the speed of the web page, depending on the width of the Internet and the hardware of the client computer, also in case one of the servers fails, the load balancer has the capacity to direct the requests to the other servers that support this task, allowing that the communication between the client and the server is not affected, these failures are imperceptible to the client. No matter what configuration the load balancer has, it is important to always keep the servers updated so that no data will be lost in the process.
There are different algorithms that support the load balancer for multiple benefits, these can be chosen at the convenience of the needs that have and also can be combined to enhance their capacity to what is needed and these are some:
- Round Robin: the requests are distributed in the group of servers sequentially.
- Least Connections: a new request is sent to the server according to the number of current client connections. The capacity of each server is taken into account to determine the number of connections.
- IP Hash: The client’s IP address is used to determine which server receives the request.
And as the last link in this web configuration, we will talk about the firewall, it is a software or hardware device that is designed to prevent unauthorized users from accessing the private networks connected to the Internet, it is located between the networks and in it a selection of rules are configured that allow network traffic to allow, limit, encrypt or decrypt this, it is also a monitoring and filtering tool to access the data that your server collects, because if it is well configured it adds protection to the network, although it is never enough.
Its main function is to protect individual computers and those that are connected to a network against intruders who may steal your information, to avoid losing valuable data and even damage to the services of our network.
The firewall is characterized by preserving our security and privacy, protects our network, keeps safe information that is stored on our network and servers, prevents the possible entry of computer criminals to our network and prevents attacks on the denial of service.
“The firewall is located at the point of connection between the Internet and a computer or network of computers. Its operation is based on controlling all the information and traffic that, through the router, is transmitted from one network to another. If, when performing a quick analysis, the firewall considers that these data comply with security and protocol rules, they can enter the private network; but, if they do not, if they do not comply with the rules, the firewall is in charge of blocking the access of that user or unreliable information”