How HTTP Data Travels Through the Internet

Himanish Munjal
CodeX
Published in
10 min readJun 17, 2022

This year, after working for 4+ years as a software engineer, I decided its time to get back to basics which honestly speaking I didn’t understand well in the first place.

An aspect which I didn’t understand well enough is how the Internet actually works. I know we make API calls (probably HTTP) and then data is transferred through a wire but what happens in between has been a blackbox for me for the most part apart from those high-level details that we get in college.

In the quest of getting a better grasp on fundamentals, I tried to deep dive into how the HTTP packets are routed from a source machine to a destination starting from the layer 7 of the OSI model and would try my best to share that information with you guys in this article :)

Terminology

Before I get into the actual details, I would like to go through a few definitions or terminology.

  1. MAC Address :- A MAC address is a hardware identification number that uniquely identifies each device on a network. The MAC address is manufactured into every network card card, and therefore cannot be changed. MAC address is used as layer 2 address in OSI model. For e.g. 00:0d:83:b1:c0:8e.
  2. Address Resolution Protocol :- In laymen terms, ARP protocol is used to identify MAC address for an IP address. Using the MAC address identified using ARP protocols, routers understand what would be the next hop for a packet. We will discuss in detail later how ARP is used to send packets to destination in both local network and over the Internet.
  3. Network Address Translation :- Again, in laymen terms, NAT is used to translate the private address to public address. Due to limited IP4 addresses, not every device is given a public address. At your home, your devices like laptops and mobiles assigned a locally unique IP address by your wifi router. When a device sends data packets, source IP address is device’s private IP and the packet is sent to the router/gateway. The router then replaces this private IP address to its own public IP address and sends the data over Internet after making an entry in its NAT table. This NAT entry then enables the router to send the response back to your device.
    A NAT entry looks something like this. (Keep in mind this is a gross oversimplification).So when the router sees a response from 7.12.9.14:443, it knows that the data needs to be returned to 192.168.1.23:8091.
  4. Proxy Server :- A proxy server is a dedicated computer or a software system running on a computer that acts as an intermediary between an endpoint device. If you have a forward HTTP proxy setup, all your requests will be sent to that proxy, and then that proxy will make an HTTP request to destination on your behalf and send you back the response.

Packet routing

Now to the interesting part. I will discuss 3 use-cases in this article as mentioned below.

HTTP request with both source and destination in local network

In this scenario, packets do not have to travel via Internet. Mentioning below the high level steps involved

Source Machine

  1. Source machine with IP 192.168.1.200 makes an HTTP /Get request from to destination machine with IP 192.168.1.129.
  2. Packets is transferred to layer 4 of source machine where the source and destination ports are added.
  3. Packets are transferred to layer 3 where source and destination IP address is added.
  4. The packets are then passed to layer 2. This is where ARN comes into picture.
    4.1. The source machine calculates whether the destination machine is in its network or not by using subnet masking.
    4.2. In this case, as the destination is in LAN itself, source broadcasts an ARP request asking the MAC address for destination IP.
    4.3. The destination machine responds with its MAC address.
    4.4. Finally, the source attaches that MAC address as destination MAC address and attaches the source MAC address as its own.

Router

The packet is then sent to the router. The routes then also does a few things as mentioned below

  1. It checks the destination MAC address to know whether the packet is addressed to it or not. In our case, the MAC is for some other machine in the network.
  2. Based on the routing table, routes the packet to the destination machine. Here, the router is only working as a switch.
  3. As the packet needs to be transferred locally, and both source and destination know each other’s IP address, NAT is not required.

Destination machine

  1. When the packet reaches the destination machine, it first checks the MAC address and then the IP address for the destination. As both match with the machine, the machine then opens the GET message at layer 7 and processes it.
  2. Based on the request, the destination server then generates the response.
  3. Now, the destination machine becomes the source machine and source machine becomes the destination machine and the process repeats for the response to reach to the initial source machine.

HTTP request with source and destination in different network

Here, the traversal becomes a little complicated as the packet needs to travel through Internet. One process that gets added in this traversal is NAT. As private IP address only makes sense in a local network, the router when receives the packet replaces the source IP address of the packet with its own and makes an entry in its NAT table. As the response from the destination machine will be sent to the router’s IP address, this NAT entry tells router what is the actual machine in the private network to send the response to.

Source Machine

  1. Source machine with IP 192.168.1.200 makes an HTTP /Get request from to destination machine with IP 1.2.3.4.
  2. Packets is transferred to layer 4 of source machine where the source and destination ports are added.
  3. Packets are transferred to layer 3 where source and destination IP address is added.
  4. The packets are then passed to layer 2. Here, the process is little different than the use-case 1.
    4.1. The source machine calculates whether the destination machine is in its network or not by using subnet masking.
    4.2. In this case, the destination is not part of the LAN.
    4.3 As the packet is not present in local network, source decides to send the packet to its Gateway. In most cases, it is the router.
    4.4 Source makes an ARP request for the Gateway IP, which is 192.168.1.1 in our case.
    4.5. The destination machine responds with its MAC address. In our case, it’s FF which is MAC address of the Gateway/Router.
    4.4. Finally, the source attaches that MAC address as destination MAC address and attaches the source MAC address as its own.

Router

The packet is then sent to the router. The router does a lot of heavy lifting in this use-case.

  1. Router checks the MAC address to know whether the packet is addressed to it or not. In our case, the MAC is for of the router itself.
  2. It then checks the destination IP address. After checking the IP address, realises that the packet is not actually sent to it but rather to public network as the destination IP address in packet is 1.2.3.4.
  3. Router then changes the source IP address (192.168.1.200) with its own public IP address which is 7.7.7.7.
  4. It then makes an entry in NAT table with source as 192.168.1.200:port and destination as 1.2.3.4:port.
  5. Here is the interesting part. Now as the router knows that the packet needs to be sent to Internet, it checks its routing table to see if it knows the next hop for the packet to be sent.
  6. If the router has an entry for the address in its routing table, it sends the packet to that path, else, it sends the packet to default route which is usually its ISP.
  7. In both use-cases, based on the IP address, it makes an ARP request to find the MAC address for the next hop.
  8. Now, the source MAC address for the packet is of the router itself and the destination MAC address is for the next hop.

In above example, in step 1 to 4, the router is working as a gateway as it replaces the private IP to public IP. From 5 to 8, the router is acting as a router.

Intermediate nodes/routers

For all the intermediate nodes between source and destination, steps 5–8 as mentioned above are followed till the packet reaches the destination.

Destination machine

  1. The machine first checks the MAC address and then checks the IP address and realises the packet is addressed to itself.
  2. Destination then generates the response and sends it back to the source address of the request. In our use-case, this will be 7.7.7.7 that is the public address of the router.

HTTP request with source and destination in different network with forward proxy at source

In this scenario, all the packets from source machine are sent to the proxy server. The proxy then creates a request to the destination server, gets the response for the request, packages the response for the source and sends it back.
A proxy provides various benefits like hiding the IPs for the source which provides an extra layer of security. Along with it, proxy can also cache the response.

Source Machine

  1. Source machine with IP 192.168.1.200 makes an HTTP /Get request from to destination machine with IP 1.2.3.4. This is the use-case where adding destination IP as part of HTTP request becomes important as we will see below.
    Note :- “Host” header was introduced in HTTP 1.1. This “Host” attribute is where the destination machine IP address is mentioned (1.2.3.4 in our usecase)
  2. Packets is transferred to layer 4 of source machine where the source and proxy port are added.
  3. Packets are transferred to layer 3 where source and proxy IP address is added.
    Note :- As mentioned above, in case of proxy, all the requests are made to proxy only. That is why destination IP and port are for proxy.
  4. The packets are then passed to layer 2. Here, the process is same as
    use-case 2
    4.1. The source machine calculates whether the destination machine is in its network or not by using subnet masking.
    4.2. In this case, the destination is not part of the LAN.
    4.3 As the packet is not present in local network, source decides to send the packet to its Gateway. In most cases, it is the router.
    4.4 Source makes an ARP request for the Gateway IP, which is 192.168.1.1 in our case.
    4.5. The destination machine responds with its MAC address. In our case, it’s FF which is MAC address of the Gateway/Router.
    4.4. Finally, the source attaches that MAC address as destination MAC address and attaches the source MAC address as its own.

Local network router and intermediate routes

The process here is same as the process in use-case 2. Only difference is, in use-case 2, the destination is actually the destination machine whereas in this use-case, destination is the proxy server but the router is completely agnostic of it.

Proxy server

Now, as the packet reaches the proxy server, its proxy’s responsibility to send the packet to the actual destination. This is where “Host” header comes into picture.

  1. The proxy gets the packet from the source machine as HTTP request.
  2. As this is an HTTP proxy, the proxy reads HTTP data and reads the Host header to know the destination IP.
  3. From here, the proxy server makes an HTTP request to destination machine (1.2.3.4 in our use-case). This process is same as the use-case 2 where proxy is the source and destination is the destination.
  4. The proxy then gets the HTTP response from destination machine, creates a response for the request from source machine in point 1, and sends it back.

Destination server

Destination server here is not at all aware of the source machine, the packets it gets all have the source address as the address for proxy server. Destination server processes the request and sends the response back to proxy server.
The proxy server then receives this response and transfers it back to the actual source machine (192.168.1.200 in our use-case) via its router (7.7.7.7)

This is it for the article. Hope you got to know something new from this. If you liked the article then please like and follow. You can also check my other articles if you want to understand network concepts like HTTP, HTTP/2 and 3, TCP UDP QUIC and more.

Also, follow me on linkedIn :)

--

--

Himanish Munjal
CodeX
Writer for

Hi, working as an SDE 3 at Amazon. I write about tech with low level details. Please reach out for any recommendation and suggestion.