How DNS Works?
What happens when you type https://www.holbertonschool.com (or other domain) in your browser and press Enter?
Every time when we visit a web page in our browser, the first thing we do is for matches in a search engine like Google or type the complete name of the page (ended in .com) in the top address bar; if server response success we will see a nice page with all its content (mostly).
Firstable, let’s clarify what a DNS is.
Is the technology that translates human-adapted, text-based domain names to machine-adapted, numerical-based IP. When users type domain names into the URL bar in their browser, DNS servers are responsible for translating those domain names to numeric IP addresses, leading them to the correct website.
Any Internet-connected computer can be reached through a public IP address, either an IPv4 address (e.g.
173.194.121.32
) or an IPv6 address (e.g.,2027:0da8:8b73:0000:0000:8a2e:0370:1337
).Computers can handle such addresses easily, but people have a hard time finding out who’s running the server or what service the website offers. IP addresses are hard to remember and might change over time.
So, In the following image, we see the basic operation of the DNS server.
A web page and all your content is hosted in a remote server located anywhere in the world, it has a public IP that identifies it on the internet. In previous image, DNS server is responsible for returning the IP that corresponds to the domain and Web Server returns the content requested by the user with a HTTP response.
- We will cover many concepts to understand the complete Process.
IP address and TCP/IP protocol
An IP address is what we call a network addressable location. Each IP address must be unique within its network. When we are talking about websites, this network is the entire internet.
IPv4, the most common form of addresses, are written as four sets of numbers, each set having up to three digits, with each set separated by a dot. For example, “111.222.111.222” could be a valid IPv4 IP address. With DNS, we map a name to that address so that you do not have to remember a complicated set of numbers for each place you wish to visit on a network.
TCP/IP: Transmission Control Protocol and Internet Protocol are communication protocols that define how data should travel across the web. This is like the transport mechanisms that let you place an order, go to the shop, and buy your goods. In our example, this is like a car or a bike (or however else you might get around).
TCP defines how applications can create channels of communication across a network. It also manages how a message is assembled into smaller packets before they are then transmitted over the internet and reassembled in the right order at the destination address.
IP defines how to address and route each packet to make sure it reaches the right destination. Each gateway computer on the network checks this IP address to determine where to forward the message.
Common types of TCP/IP include the following:
- HTTP (Hyper Text Transfer Protocol) handles the communication between a web server and a web browser.
- HTTPS (Secure HTTP) handles secure communication between a web server and a web browser.
- FTP (File Transfer Protocol) handles transmission of files between computers.
Terminology involved with DNS
Top-Level Domain
A domain name has a simple structure made of several parts (it might be one part only, two, three…), separated by dots and read from right to left:
A top-level domain, or TLD, is the most general part of the domain. The top-level domain is the furthest portion to the right (as separated by a dot). Common top-level domains are “com”, “net”, “org”, “gov”, “edu”, and “io”.
Label (or component)
A domain name can have many labels (or components). It is not mandatory nor necessary to have 3 labels to form a domain name. For instance, www.inf.ed.ac.uk is a valid domain name. For any domain you control (e.g. jvillegas.tech), you can create “subdomains” with different content located at each, like web-01.jvillegas.tech or lb-01.jvillegas.tech.
“The difference between a host name and a subdomain is that a host defines a computer or resource, while a subdomain extends the parent domain. It is a method of subdividing the domain itself.”
The root domain and sub domain — differences
A root domain is the parent domain to a sub domain, and its name is not, and can not be divided by a dot.
While creating any domain at a website of domain provider, the provider system will always ask you to choose a domain name without a dot in the name. In other words, the address of the root domain may be mydomain.com but it can never be my.domain.com. Domain providers block the ability to create such a root domain until you type a name without the dot. Why?
The dot in the domain name delimits the sub domain name (the part of the name before the dot, eg. www.my.) and the root domain name ( the part after the dot, ie .domain.com). This means that the address my.domain.com is a sub domain of the root domain, whose name is domain.com
Name Server
is a computer designated to translate domain names into IP addresses. These servers do most of the work in the DNS system. Since the total number of domain translations is too much for any one server, each server may redirect request to other name servers or delegate responsibility for a subset of subdomains they are responsible for.
Name servers can be “authoritative”, meaning that they give answers to queries about domains under their control. Otherwise, they may point to other servers, or serve cached copies of other name servers’ data.
Zone File
A zone file is a simple text file that contains the mappings between domain names and IP addresses. This is how the DNS system finally finds out which IP address should be contacted when a user requests a certain domain name.
Zone files reside in name servers and generally define the resources available under a specific domain, or the place that one can go to get that information.
Records
Within a zone file, records are kept. In its simplest form, a record is basically a single mapping between a resource and a name. These can map a domain name to an IP address, define the name servers for the domain, define the mail servers for the domain, etc.
How DNS Works
There are four servers that work together to deliver an IP address to the client: recursive resolvers, root nameservers, TLD nameservers, and authoritative nameservers.
The DNS recursor (also referred to as the DNS resolver) is a server that receives the query from the DNS client, and then interacts with other DNS servers to hunt down the correct IP. Once the resolver receives the request from the client, the resolver then actually behaves as a client itself, querying the other three types of DNS servers in search of the right IP.
Root Servers
DNS is, at its core, a hierarchical system. At the top of this system is what are known as “root servers”. These servers are controlled by various organizations and are delegated authority by ICANN (Internet Corporation for Assigned Names and Numbers).
There are currently 13 root servers in operation. However, as there are an incredible number of names to resolve every minute, each of these servers is actually mirrored. The interesting thing about this set up is that each of the mirrors for a single root server share the same IP address. When requests are made for a certain root server, the request will be routed to the nearest mirror of that root server.
Root servers handle requests for information about Top-level domains. So if a request comes in for something a lower-level name server cannot resolve, a query is made to the root server for the domain.
The root servers won’t actually know where the domain is hosted. They will, however, be able to direct the requester to the name servers that handle the specifically requested top-level domain.
So if a request for “www.wikipedia.org” is made to the root server, the root server will not find the result in its records. It will check its zone files for a listing that matches “www.wikipedia.org”. It will not find one.
It will instead find a record for the “org” TLD and give the requesting entity the address of the name server responsible for “org” addresses.
First the resolver queries the root nameserver. The root server is the first step in translating (resolving) human-readable domain names into IP addresses. The root server then responds to the resolver with the address of a Top Level Domain (TLD) DNS server (such as .com or .net) that stores the information for its domains.
Next the resolver queries the TLD server. The TLD server responds with the IP address of the domain’s authoritative nameserver. The recursor then queries the authoritative nameserver, which will respond with the IP address of the origin server.
The resolver will finally pass the origin server IP address back to the client. Using this IP address, the client can then initiate a query directly to the origin server, and the origin server will respond by sending website data that can be interpreted and displayed by the web browser.
Now, that we understood a bit about the main operation of DNS, it’s time to see what happens when Dns server response is successful and browser sends a request to the web server with a correct and secure infraestructure like next. We will explain the following diagram.
So, when Dns server finds the assigned IP and responds to the browser, it makes the request to the server that is hosting the page.
In the above image, we see that there are other “redundant” servers that helps reliability and availability of the service (web page).
By having this architecture, we can direct subdomains to different servers and the main domain to a server with a load balancer, which will help to redirect each request from a user to the other servers sequentially.
In your provider’s Dns management, you can configure type A records to assign the IP to the subdomain, like in the following image.
Before to continue, let’s clear the following concepts.
Web Server
A web server is a software that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web (WWW), in other words delivers web pages; while a server is an actual computer.
Web server software controls how a user accesses hosted files. It is accessed through the domain names of websites and ensures the delivery of the site’s content to the requesting user, such as HTML documents, images and JavaScript files.
All computers that host web sites must have web server software. Leading web servers include Apache, Microsoft’s Internet Information Server (IIS) and Nginx.
Application Server
Application servers are system software upon which web applications or desktop applications run, consist of web server connectors, computer programming languages, runtime libraries, database connectors, and the administration code needed to deploy, configure, manage, and connect these components on a web host.
An application server runs behind a web server in front of an SQL database. Web applications are computer code which run atop application servers and are written in the language(s) the application server supports and call the runtime libraries and components the application server offers.
The application developers develop programs according to the specification of the application server. Dependence on a particular vendor is the drawback of this approach. Some of them are Python, Java EE, Php, Ruby, Go and .Net.
An application server exposes business logic to the clients, which generates dynamic content.
Back again.!! to our main infraestructure diagram..!!
Now, it’s time to explain the firewall, load balancer software and SSL on the server.
“To optimize the deploy on other servers, you can use many tools like Puppet or Ansible to leave it with the same settings”
So, what’s a Firewall?
Is a network security system designed to prevent unauthorized access to or from a private network connected to the Internet, especially intranets. All messages entering or leaving the intranet pass through the firewall, which examines each message and blocks those that do not meet the specified security criteria.
Firewall Filtering Techniques
There are several types of firewall techniques that will prevent potentially harmful information from getting through:
- Packet Filter: Looks at each packet entering or leaving the network and accepts or rejects it based on user-defined rules. Packet filtering is fairly effective and transparent to users, but it is difficult to configure. In addition, it is susceptible to IP spoofing.
- Application Gateway: Applies security mechanisms to specific applications, such as FTP and Telnet servers. This is very effective, but can impose a performance degradation.
- Circuit-level Gateway: Applies security mechanisms when a TCP or UDP connection is established. Once the connection has been made, packets can flow between the hosts without further checking.
- Proxy Server: Intercepts all messages entering and leaving the network. The proxy server effectively hides the true network addresses.
In practice, many firewalls use two or more of these techniques in concert. A firewall is considered a first line of defense in protecting private information. For greater security, data can be encrypted.
Load Balancer
Facebook, Linkedin, Twitter and other web giants are handling such huge amounts of traffic? They don’t have just one server, but tens of thousands of them. In order to achieve this, web traffic needs to be distributed to these servers, and that is the role of a load-balancer.
Load balancer will distribute the work-load of your system to multiple individual systems, or group of systems to to reduce the amount of load on an individual system, which in turn increases the reliability, efficiency and availability of your enterprise application or website.
Load balancers generally implements a combination of one or more scheduling algorithms. Round Robin Scheduling is one of them.
Requests are served by the server sequentially one after another. After sending the request to the last server, it starts from the first server again.
The diagram below depicts this approach. Sequentially each request gets assigned to each server one by one and the round goes on. The change in the request assigned can be easily understood by looking into the diagram below.
This algorithm is used when servers are of equal specification and there not much persistent connections.
And finally, most common software load balancers are:
- HAProxy — A TCP load balancer.
- NGINX — A http load balancer with SSL termination support. (install Nginx on Linux)
- mod_athena — Apache based http load balancer.
- Varnish — A reverse proxy based load balancer.
- Balance — Open source TCP load balancer.
- LVS — Linux virtual server offering layer 4 load balancing
HTTPS/SSL
In main infraestructure diagram, we see that the server load balancer has a ssl termination, these are used to reduce the load on the main servers by offloading the cryptographic processing to another machine. SSL termination intercepts encrypted https traffic when a server receives data from a secure socket layer (SSL) connection in an SSL session. Its main function is that decrypts and verifies data on the load balancer instead of the application server.
HTTPS pages typically use one of two secure protocols to encrypt communications — SSL (Secure Sockets Layer) or TLS (Transport Layer Security). Both the TLS and SSL protocols use what is known as an ‘asymmetric’ Public Key Infrastructure (PKI) system. An asymmetric system uses two ‘keys’ to encrypt communications, a ‘public’ key and a ‘private’ key. Anything encrypted with the public key can only be decrypted by the private key and vice-versa.
As the names suggest, the ‘private’ key should be kept strictly protected and should only be accessible the owner of the private key. In the case of a website, the private key remains securely ensconced on the web server. Conversely, the public key is intended to be distributed to anybody and everybody that needs to be able to decrypt information that was encrypted with the private key.
And what about the database?
Of course it is one of the most important elements of a web page.!! Our Bussiness logic and Backend resource. We can dynamically deliver the content to the client, making queries to the database, developed together with the server application.
Is a collection of information that is organized so that it can be easily accessed, managed and updated. Computer databases typically contain aggregations of data records or files, containing information about sales transactions or interactions with specific customers.
In a relational database, digital information about a specific customer is organized into rows, columns and tables which are indexed to make it easier to find relevant information through SQL or NoSQL queries.
There are different database engines, Mysql for relational db’s and MongoDb for NoSQL.
MySQL is a popular open source database management system commonly used in web applications due to its speed, flexibility and reliability. MySQL employs SQL, or Structured Query Language, for accessing and processing data contained in databases.
MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need. It stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time.
The document model maps to the objects in your application code, making data easy to work with. Ad hoc queries, indexing, and real time aggregation provide powerful ways to access and analyze your data
MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution are built in and easy to use.
References
https://www.cloudflare.com/learning/dns/what-is-a-dns-server/