A Martian’s guide to the Internet (1)

“While writing a blog post on how to set up a web server on a Raspberry Pi, I realized that there were still quite a lot of things in the setup process that I didn’t understand well. It did allow the setup to work, but I hardly had any idea what was going on behind the scenes. Having no academic background in Networking, I just trusted Google and Stack Overflow on guiding me correctly. Turns out, that this isn’t a good strategy when you want to explain the same stuff to someone else. You surely won’t go about telling someone how to prepare fried chicken just because you have tasted it (FYI, I can prepare fried chicken and I’m a vegetarian, so haven't tasted it).
So, I rummaged through different resources in the past few days to be debriefed on how the current networking situation came to be. This post tries to sum up some of the things I learned. I might have got some of the dates wrong, or contorted some facts while trying to connect various sources from the internet, but I assure the general flow of events are pretty much the same. There will be lot of terms that you might not know (or probably know a lot about) but they eventually reveal themselves through to the end. And obviously, there’s always a lot to more to learn everyday.
As for the title of this post, eh, couldn’t think of anything catchy and the “Brief History of the Internet” was already taken by the internet’s founding fathers. Just assume that the Martians would be advanced enough to access the internet and comprehend English with technical jargon but stupid enough like me to not how how the internet works…yet..”

Not many people care about how the internet works, actually, not many need to know. But for some like me who spent a few too many days Google-ing about it, I have come up a with a gist of what I learnt.


Who started the Internet?

1960s :

The United States Department of Defence funded a project called ARPANET (Advanced Research Projects Agency Network) that was some sort of communications network which would enable a user to access different computers from a single terminal. This was not easily achievable during these days.

early 1960s :

Paul Baran joined the RAND (Research and Development) Corporation. This organization helped the US in being informed about issues like the space race, US-Soviet nulear arms race, heatlh care, digital revolurion and a lot more. Paul was working on a communication network system that could survive even if one of the nodes(computers) in the network was dead. It was a system of distributed adaptive message block switching.

second half of 1960s:

A similar independent research was conducted by Donald Davies at the UK’s National Physics Laboratory. He coined the term packet switching.

late 1960s:

ARPANET followed the concepts of packet switching as suggested by Donald Davies.

1970s:

Robert Kahn joined Vinton Cerf, who already worked on ARPANET at this time. They came up with a protocol that could enable any network following their protocol to be a part of ARPANET.

This was the beginning of the internet…

ARPANET logical map, March 1977 (Wikipedia)

In the words of Vinton Cerf:

“The Internet is a design philosophy and architecture expressed in a set of protocols which makes it easier for it to adopt and absorb new communication technologies.”


The advent of Packets and Packet-Switching

You might already be knowing this but just a brush up: A server is an entity that serves some information or data. A client is an entity that asks for or rececives that information or data. In our case this entity is computer

A network of computers
Resend the message as original path is broken.
Resend the message as even the new path is broken.

It’s not that before 1970s computers were not able to communicate with each other, it was just that the system wasn’t robust and ‘connected’ enough. In simplified crude terms, you can imagine it like an older centralized network vs a newer distributed network. The centralized network had all computers connected directly to the computers that served the needed information. There were limited ways to reach from one computer to another and the most severe drawback was that all the data followed a single path from a given client to a given server. If this path was damaged, the data and communication link would be lost and the message had to be sent again via a different path. This was the circuit switching mechanism.

The newer packet-switching model had many nodes connected to each other or used the existing connections along with the then-new packet switching mechanism. In this mechanism, the data to be transmitted was broken up in chunks called packets and simultaneously sent across many possible paths from that node, instead of just one path to the destination. The allowed size of each packet has a range that is determined by the network protocols. These packets would then be recombined at the destination to form the complete message. Since all the packets took different paths, even if one of the path was blocked, the packet on that path would take a different path and would ultimately reach the destination. Various path options for the message to reach the destination provided resilience to a few dead nodes or even some broken paths. This mechanism also ensured that even if a segment of the network went down, the other parts were still functioning.

Take a look at this example. Assume a message P is to be sent from the green computer on the left to the one on the right. In the packet-switching model, P is broken into parts p1, p2 and p3 and is sent along different paths with information about its final destination attached with each packet. If any of the paths is broken, the computer just at the start the broken path sends the packet to a different path based on the packets destination information. The selection of this new path is done by a routing algorithm.

The packets are supplied with the information such as the senders address and the destination address, just like a post card. But in this case, the post card is split up in different pieces each carrying the addresses and carried on various paths and later joined back at the destination. The information supplied along with the data is called packet-header and is used to rebuild the original message from the packets. Hence packets can be regarded as the basic units of information over the internet.

Packets are supplied with lots of “metadata”

So why is packet switching better?

To decide on the superiority (or inferiority) of packet-switching, we need to compare it against its predecessor mechanism: circuit-switching. This type for model was used in earlier telephone networks.

There are also some minor drawbacks of packet-switching mechanism. Packaging and routing packets may take up some amount of extra time. This is mitigated by efficient algorithms and curbed to a large extent.

If some packets do not follow the protocols , they could prove to be a security risk for other packets travelling in the same channel, but that is altogether a separate topic of Network Security.


TCP/IP: The language of packets

Now that we have established how the “Internet” was a different model of networks than its predecessors, lets look a bit more into what governs the movement of packets inside this vast network. Considering that we have called TCP/IP as the language of the packets (and of the internet) there needs to be some rules and protocols as in any other language. A protocol can be regarded as a set of agreed upon rules that just makes life easier for everyone involved.

Recalling Vincent Cerf’s words “The Internet is a design philosophy and architecture expressed in a set of protocols which makes it easier for it to adopt and absorb new communication technologies.” Even though ARPANET was leaps and bound better than the previous model, its main concept “the internet was not designed for just one application, but as a general infrastructure on which new applications could be conceived”, could be achieved only when TCP/IP replaced its predecessor protocol on the ARAPNET in 1983.

TCP/IP (Transmission Control Protocol/ Internet Protocol) was one such pair of protocol developed by Robert Kahn, Vinton Cerf and their team. TCP/IP model in itself is a suite of protocols. Its functionality can be divided into four layers. Each of these layers have various protocols. This is one good link that succinctly discusses the TCP/IP model: Microsoft TechNet. Discussing all these here would make this already long post even more boring.

This image taken from ElectronicDesign blog gives a nice visualization of how the data is turned into packets by adding information at each layer of the OSI model (another model similar to the TCP/IP model). Do read their entire post for a better overview of TCP/IP and OSI models.

IP: Whats the address?

Rewinding… The internet is a network of network. Data is sent in packets. Packets have addresses of where the data is sent from and where to. So how, why and what are these addresses?

The IP (Internet Protocol) from the TCP/IP deals with addresses as unique identifiers for computers in a network. At the dawn of the internet era, Internet Protocol was IPv4. This protocol uses a 32bit number to assign an address to each computer.

The IPv4 address comprises of a 32 Bit number usually represented by 4 “octets”, each an integer from 0–255. The different octets represent different parts of the address, mainly the Network Number and Host Number. The Network number is assigned by the InterNIC (Network Informatino Center), an organization responsible for DNS (Domain Name System) domain name allocations. Later this task was taken up by ICANN (Internet Corporation for Assigned Names and Numbers). The Host Number (sometimes called a local or machine address) is assigned by the local network administrator. The division of Network and Host part of the IP is determined by different classes of IPv4 address protocol (A,B,C). The answers to the question here explains a lot more about the IP address and how to go about dissecting it.

If you do the math: an IPv4 address consists of a 32 bit number, so there can be about 2³² (4,294,967,296) different theoretical IPv4. But that does not represent the total number of IP addresses that can be assigned to the devices. This is because certain IP address are assigned for special purposes. A list of reserved IP address can be found here.

The number of available IPv4 addresses per person is less than 1 if we go by the current population. Obviously the number of networking devices will be far more than the number of people. So how to deal with the limits to the available addresses? Enter IPv6. This new address format uses 128 Bits (represented as hexadecimal). Thats 2¹²⁸. Thats more than a billion addresses for every person on earth! The transition from IPv4 to IPv6 started on June 8 2011 (World IPv6 Day) and is gradually under way and will take a long long time.


Connecting to the network of networks

As mentioned earlier, the internet is called as a network of networks. But since all the computers are connected, isnt it just one large network? The answers is yes and a bit of no.

Again refreshing some terminology before moving ahead:

Bandwidth can be regarded as a metric that determines how much data or how many data packets can move from one location to another at the same time. A higher bandwidth means more data packets travelling at the same time, which results in faster downloads from the internet.

Routers are computers in a network that connect multiple computers (or networks)together. There can be various types of routers depending on the network. We’ll discuss more about these a bit later.

Modems, in simple terms can be thought of as devices that help computers to send and receive data over a communication channel. Modern routers can come built in with modems.

LAN (Local Area Network)

If you are (or know someone who is) into gaming, you might have heard of people playing Counter Strike or *insert-any-multiplayer-game-you-like* on LAN for better latency. LAN is a local area network that interconnects a group of computers within a small area like a building. These connections can be wired or over a Wireless Local Area Network (WLAN). People often misuse the term “hotspot” in place of WLAN. Hotspot just refers to the physical region around a wireless router in which you have connectivity. A LAN network in itself does not have access to the computers in the outside world.

ISP (Internet Service Provider)

When the router is coupled with a modem, it can establish a connection to your Internet Service Provider (ISP). The modems have to be in accordance with the ISP’s infrastructure. To reduce the hassle (and to make some extra money), the ISPs nowadays provide their own routers that have built in modems.

But how are ISPs connected to the internet?

ISPs are divided into different categories or Tiers.

Tier 1 ISPs are internet provides who exchange the internet traffic (the always moving data packets) between them. Tier 1 service providers are the ones who enable us to have intra-continental as well as inter-continental connectivity. Cogent Networks is an example of Tier 1 ISP. They provide connectivity on a scale that ranges from countries to continents. Since these have to deal with large traffics that literally support the entire internet, they are sometimes also referred to as backbone internet provides. The traffic is exchanged between them by Peering Agreements (the agreement between two large Internet providers needing to exchange traffic. Without paying exorbitant fees to do so).

Tier 2 ISPs connect Tier 1 and Tier 2 ISPs. These are companies smaller than Tier 1 and find it easier to purchase Internet transit from Tier 1 ISPs, than to deal with the large hardware setups and peering agreements. Vodafone is one such example. They might sometimes also come into peering agreements with Tier 1 ISPs.

Tier 3 ISPs are those who only purchase internet transit. These are the ISPs that provide internet services to households and businesses. Comcast is a Tier 3 ISP. Since these ISPs are the last connection to your device, these are also called List Mile Internet providers. What we as customers pay these ISPs, is for a bandwidth, higher bandwidth equals higher download speeds (equals higher bills).

Since the ISPs keep on getting bigger and bigger, there is no defining line between the tiers.

Its not always necessary that a packet will travel through all these tiers since the destination will not necessarily be on some other continent.

ISPs can also be classified on the basis of task they do: Access Provider ISP (provide customers with internet access, like Comcast), Hosting ISPs (these can also host your web servers, emails, or online storage), Transit ISPs (the different tiers of ISPs is a classification of transit ISPs) , etc.

This was an overview of the underlying infrastructure of the internet. I hope this article helped you learn something new or at least refreshed your knowledge on the topic. I have continued more on how we as users connect to this vast network in my next post: A Martian’s guide to the Internet (2).