How does the internet work?

Olga
15 min readJan 13, 2022

--

Photo by Nastya Dulhiier on Unsplash

How does the internet work? And what is the internet anyway?

The internet can be defined as a very large network consisting of numerous smaller networks.

The internet consists of Technical Infrastructure and the Web.

The Web is all the resources or information that is available to us “on the internet”. The Web can be described as the service that sits on top of technical infrastructure.

The technical infrastructure was engineered by thousands of people over many many years. That infrastructure includes actual physical hardware and various software.

Conceptually the Internet can be described as a system composed of multiple layers.

Each layer is responsible for performing a specific function and has to follow specific rules.

Each layer is independent from any other layers. But at the same time, layers must work together to make the whole system work.

But first, what is a network?

Network

If we have two or more devices connected to each other and that they are able to send/receive some data to/from each other we have a network.

Basic Network

In a real world situation, in a library, for example, it would be called a Local Area Network or LAN.

In the case of LAN there would be many devices connected to each other. They are typically connected via a network bridging device and network cables.

The network bridging device that connects the computers/devices is usually: a hub, or a switch.

The main difference between the two is that, switch is what’s called an intelligent device and the hub isn’t.

A switch forwards data to the device that the data was sent to. The hub broadcasts the data to all the devices that are connected to it.

What if a library computer wants to send some data to another library computer that is located in a different city? In this case we need a way to establish a connection with an outside network. We need a router.

Router

A router is a device that can send (or route) data between the networks. The internet can also be described as a large number of networks connected via many many routers.

Routers enable inter-network communication

Routers are essential devices. Typically a router has at least two connections, although most would have more than two. One of the connections or ports is connected to some network, another connection is connected to another network.

Essentially this is what the router does:

  • Gets some data (packet)
  • Reads metadata of the packet (destination IP)
  • Decides where to forward it next
  • Forwards it

That data may end up being forwarded to another router. Or it can be forwarded to its destination device (if receiving router is connected to the destination device’s network).

Data typically needs to go through many routers in order to reach its destination

Most important thing to remember is that the router is responsible for directing internet traffic.

How does data end up being sent somewhere in the first place? Let’s talk about the request-response cycle for a minute.

Request-Response cycle

Let’s say you would like to visit some web-page, for example, www.duckduckgo.com.

You first type in the URL in the browser address bar.

URL stands for Uniform Resource Locator. It is basically a name (or address) that identifies some resource on the Web.

Also browser in our case is referred to as the Client.

So you typed in www.duckduckgo.com and hit “Enter”.

The browser does some work, packages the data and sends the data on the way.

By data I mean, some information about the request to view the DuckDuckGo page.

So when you typed in the URL and pressed enter, you essentially said

“Hey server, can I please view DuckDuckGo in my browser?”

The browser translates the URL into an IP address (this is called DNS resolution).

Then the the request to see the page (the data) is packaged according to the rules of HTTP protocol and passed down to the next networking layer.

Ultimately the data goes through several layers. Each layer is responsible for some function that contributes to delivering that data to its destination.

Each layer encapsulates the original data and adds some metadata that would be used by the layer below it (to help it accomplish its function).

On the receiving end there is a server.

Server is usually some software located on a different machine and is programmed to know how to handle client requests. Those machines are sometimes also called servers.

Server receives the request. Then based on the contents of the request it performs some actions that basically boils down to figuring out what to do next and then doing it.

Then the server sends a response to the client. Before data is sent back, it is packaged according to specific rules and passed through several layers.

When data is traveling from the server, it is also going through the same layers as the original request did but in reversed order.

The client receives the response message and processes its contents. In our example, it would would render a DuckDuckGo page in the browser window.

There most likely gonna many messages (packets) from the server that would be sent one after the other to the client to make a page like this.

There are many models that describe how the internet works.

The two models that most commonly come up are OSI model and TCP/IP model.

Both models represent the internet or network communication as a set of layers.

There is some difference in how the models describe those layers, but there is also a lot of overlap.

Layers that comprise OSI and TCP/IP Models

For the rest of the article I will be mostly referring to the layers of the TCP/IP model.

Protocols

There are many different types of internet enabled devices out there. It is not just computers, there are also security cameras, thermostats, gaming consoles, etc that are also able to connect to the internet. It is possible to view a security camera feed on a smartphone in realtime.

How are the devices able to communicate and understand each other?

This is possible because there are rules in place. They are called protocols or network protocols.

Protocols dictate how the data is transferred to and from the devices. There are many different protocols.

I am gonna cover protocols that are the most relevant to this discussion.

  • IP
  • TCP
  • HTTP
  • Ethernet
  • DNS
  • UDP
  • TLS

When data travels through the network it goes through each of the layers until it reaches the destination. Once the response is ready it goes through the same layers again but in reversed order.

You can think of protocols as a set of rules that are in effect during a specific part of the data’s journey or within a specific layer.

Some of the protocols are operating in entirely different layers, for example IP and TCP. IP belongs to Network Layer. TCP is a Transport layer protocol.

Other protocols operate within the same layer, for example DNS and HTTP are both a part of Application layer.

Protocol Data Unit

When the data travels across the internet, through all of the layers, it is encapsulated as a Protocol Data Unit or PDU.

PDU

Protocol Data Unit is a unit of information (or data) that is sent over a network.

Different protocol layers have different names for the PDUs.

  • Link/Data Link layer, a PDU is a frame
  • Internet/ Network layer, it is a packet
  • Transport layer, it is a segment (TCP) or datagram (UDP)
PDU from previous layer is a data payload of the next layer

PDU at each layer contains some protocol specific information about the data. As PDU goes from layer to layer, some additional data is added to it.

Generally PDU includes:

  • Header
  • Data payload
  • Trailer and/or footer (although not always)

PDU at one protocol layer turns into Data Payload in the layer below it. This mechanism is described as Data Encapsulation.

Due to encapsulation protocol at any particular layer does not need to know any information about any other layers to be able to perform its task or to collaborate with the other layer.

Layers

Let’s say you are traveling from city A to city B. City B is located in a different country. To get there you might need to first take a bus to an airport, then hop on a plane, then take another bus or maybe even a train, and then possibly to get a ride from the train to reach the ultimate destination.

What a nightmare…

Physical Layer

Physical Layer is the first layer in the OSI model. This layer covers any physical infrastructure that is involved in the transfer of binary data.

Binary data, or bits, are transferred in the form of electrical, optic, or electromagnetic signals.

There are some important terms that are used to describe the performance of the physical network.

Each time you switch from a bus to a plane, then to another bus, you hop, just like a packet would, when it’s traveling to its destination.

Network hop — a single leap from one router to the other that is made by the packet when it travels from source to its destination. Because there are many routers connecting different networks a typical trip for the packet usually consists of many hops.

The total time it takes for you to get from city A to city B, is latency.

Latency — amount of time it takes for data to get from point A to point B, sometimes called a delay. The following are the types of delay that contribute to total latency.

  • Propagation delay — amount of time it takes for a signal to get from the router to a receiver. It is measured as the distance divided by the speed.
  • Transmission delay — amount of time it takes to put the packet’s data bits onto the ‘link’. A ‘link’ can be represented by a switch, router, or some other network device that connects different parts of the whole network.
  • Processing delay — the time it takes for the router to process the packet (data).
  • Queueing delay — Network devices can only process a limited amount of data at once. If too much data arrives too quickly, and the router isn’t able to process it, it buffers or queues the data. The time that it takes for the data to wait in the queue is the queueing delay.

All of that adds to total latency.

Let’s go back to the trip analogy.

While you are sitting at the airport, you decide to calculate how many people travel through the airport in an hour. You estimate that it is around 1000 people per hour.

That is called bandwidth.

Bandwidth — amount of data that is sent within a specific, measured unit of time (megabits per second).

Link/Data Link Layer

For the data to be able to get to its destination, we need to identify the specific device (within the network) the data needs to get to.

This layer is responsible for the identification of those devices and for transporting the data over the physical network between the devices.

Ethernet protocol is the main protocol used within the Link/Data link layer.

Ethernet protocol adds logical structure to the unstructured stream of bits that is transferred over the physical layer.

The data is still represented by 1s and 0s. Now it is just possible to make sense of them.

Two main features of this protocol are framing and addressing.

The data at this layer is encapsulated into a PDU called a frame or Ethernet Frame. The frame consists of various fields including Data Payload and Source and Destination MAC addresses.

MAC is assigned to every network enabled device and usually cannot be changed

MAC addresses are used by Ethernet protocol to identify devices on the local network.

The Internet/Network Layer

This layer is responsible for the communication between devices located on different networks.

Main protocol used at this layer is the Internet Protocol (IP).

Internet Protocol governs how data is routed to its destination, specifically it defines how the data is structured (data packet), addressed (IP addressing), and finally, routed to its recipient.

I mentioned routers earlier. Routers forward incoming packets to their destination.

Router first inspects the incoming packet and reads its destination IP address. Then it looks in its routing table and makes the determination as to where the packet should be sent. Then it sends the packet on its way.

Internet Protocol makes it possible to send data from one device to another device, whether they are located on the same LAN or entirely different LANs.

IP addressing gives devices the ability to communicate between different LANs, even if they are located thousands of miles apart.

Once the data gets to its destination device, it next needs to be delivered to the specific application.

There could be many different applications running on the client, all at the same time. We can have Google docs, Slack, Postman all running at the same time.

How do we make sure that the data is received by the correct application?

Transport Layer

This is achieved through multiplexing (with demultiplexing doing the reverse).

Multiplexing is a method that allows transmitting more than one signal over a shared channel or connection.

Multiplexing is not unique to computer networking, it is also used in data communications, telephony, as well as audio and video broadcasting.

We are only concerned with multiplexing that takes place when data travels through the Transport Layer.

Making sure that the intended application receives the data is achieved through the use of network ports.

Ports are used to identify the process or application that should receive a particular message.

Ports are managed by devices’ Operating System.

Port numbers range between 0 and 65535. Specific ranges of port numbers are set aside for specific protocols.

  • 0–1023 — well-known ports

Used by protocols like HTTP (port 80), DNS (port 53), FTP (port 20), etc.

  • 1024–49151 — registered ports

Can be registered to be used by private companies/applications.

  • 49152–65535 — dynamic ports, or private ports. Can also be used to assign ephemeral ports.

These ports cannot be assigned or registered. Can be used by any application.

When we type in a URL in the browser search bar and try to access some web-app or a page on the web, the server on which that app is running will most likely be on port 80.However, the browser we are accessing the page from, will have one of the ephemeral ports assigned to it.

How does data get forwarded to the correct port?

PDU at the Transport Layer contains information about which port the message should be forwarded to.

The combination of IP address and the port number is a communication end-point, which is called a socket.

Two most commonly used protocols at the Transport layer are TCP and UDP.

TCP

It is a connection oriented protocol. Before the data can be sent between the client and the server, there must be a connection established first. The connection is established via a Three-way Handshake.

When you want to visit some web page in your browser and you type in the URL in the address bar and hit enter. The Three-way Handshake must happen first. This is how it goes:

  • Client sends a SYN Segment to the server. SYN means synchronize.
  • Once the server gets the packet (and if it gets the packet), it responds with SYN ACK. SYN ACK means synchronize, acknowledge.
  • Client then receives the SYN ACK and responds to the server with an ACK.
  • Once the server gets the ACK from the client, the server can now send the data right away.

When the server is done with sending the web page, it sends another packet to the client — FIN.

Client then sends an ACK to confirm it received FIN and also sends FIN to the server. Upon receiving FIN from the client, the server sends an ACK. At this point both, client and the server closed the session.

If this seems like a lot of work, its because it is!

TCP protocol provides a reliable connection.

This is achieved through message acknowledgement, in-order delivery and retransmission. It also employs flow control and congestion avoidance.

TCP protocol, although very reliable, has some disadvantages resulting from the hard work I described earlier (the three-way-handshake).

Three-way-handshake adds latency overhead.

Another disadvantage is a possibility of Head-of-line blocking. I’m only mentioning it, because it is important to be aware of it. Feel free to research more on your own. Here is a wikipedia article about that, if you are interested to learn more about Head-of-line blocking.

Another transport layer protocol is UDP.

UDP

UDP is not a reliable protocol.

It does not provide a guarantee that a message will be delivered, that it will be delivered in order. It does not provide flow-control or congestion avoidance mechanisms. But it is fast!

There is no need to establish a connection when using UDP, this means there is less latency overhead. The application can just start sending data.

It is also flexible. If a developer wishes, they can implement some or all of the features that TCP provides.

Application Layer

Application layer is, at its core, responsible for how messages that applications exchange are structured. There are many different protocols that are used at this layer. HTTP protocol is the main protocol used by applications that communicate across the Web.

When applications communicate via HTTP protocol a single message exchange is represented by the Request-Response cycle. I discussed earlier how the Request-Response cycle works.

How those messages are structured is determined by the rules of the HTTP protocol.

Each request has a request line and some headers. Sometimes it would also have a body, for example in case of a POST request.

Each response has a status line that includes a status code. Sometimes response would also contain headers and/or a body.

HTTP is a stateless protocol. Each Request-Response cycle knows nothing about the Request/Response that took place before or will take place after.

HTTP protocol is also a very insecure protocol.

The messages exchanged are sent as strings.

TLS protocol is what makes HTTP secure.

HTTP + TLS is how we get HTTPS, a Secure HTTP.

TLS

TLS stands for Transport Layer Security

With HTTPS every message exchanged is encrypted.

Right after the TCP handshake the TLS Handshake takes place.

During the TLS Handshake the following takes place:

  • The client and the server exchange the symmetric keys they will use for message encryption
  • They agree on the TLS version they will use
  • They also agree on the cipher suite (a number of algorithms that they will use during the message exchange)

TLS has three main functions:

  • Encryption

The messages are encoded in a way, so that only authorized parties are able to read it. Authorized parties in this case would be the intended recipient.

  • Authentication

Identities of the parties in the message exchange are verified by the exchange of Digital Certificates.

  • Integrity

Message Integrity is ensured through the use of Message Authentication Code, or MAC. It is a field that is added to data by TLS. The goal of the field is to make sure that the data has not been tampered with during transit.

TLS has one big downside. It further adds to latency overhead (on top of TCP Handshake).

It impacts application performance due to multiple round-trips of message exchange that are needed when establishing a secure connection.

TLS is used for TCP connections. There is a different protocol for UDP, it is called Datagram Transport Layer Security (DTLS).

Summary

The Internet is a large network that consists of many smaller networks.

The Internet can be described as network infrastructure plus the Web, which is all the internet resources that are available to us.

Conceptually the internet is a system of layers. Each layer is responsible for a specific function or a set of functions.

The rules that govern how each layer operates are called protocols. Protocols that were covered in the article are: HTTP, IP, Ethernet, TCP, UDP, TLS.

There are two main models that describe how the internet works: TCP/IP and OSI.

According to the TCP/IP model there are four layers: Application, Transport, Internet and Data Link.

Latency and Bandwidth measure the performance of the network. Latency measures how much time it takes for data to travel from point A to point B. Bandwidth measures how much data can be sent from point A to point B within a specified unit of time.

The data that is sent over a network is encapsulated into a Protocol Data Unit, or a PDU. PDU at one layer is passed to the next layer where it becomes a data payload as more metadata is added to it. That metadata helps the next layer to do its job.

HTTP governs how messages sent between client and server are structured. It is not a secure protocol because all messages are transmitted as strings. TLS protocol is what makes HTTP secure. HTTP combined with TLS is HTTPS. With HTTPS all messages exchanged are encrypted. Main drawback of TLS is that it adds to latency overhead.

Thank you for reading!

Photo by Gabriel Crismariu on Unsplash

Get a mix of tech, growth, and intent delivered straight to your inbox!

Subscribe to my free Newsletter here

--

--