In this article I shall show two main types of Load Balancers TCP (Layer 4) Load Balancing (L4-LBs) and HTTP(S) (Layer 7) (L7-LBs). In case you have forgotten the OSI networking model for all the network layers, check Figure 1 below.
You may be asking, why have a L7 and L4 LB? why not a dedicated single LB, and the answer to that depends on your infrastructure and application requirements.
HTTP(S) (Layer 7) Load Balancing
A L7-LB acts as a typical proxy that often sits between an external/global router, and your backend services. Figure 2, below showcases the program flow for a typical L7-LB. Packet A comes in from a router and is assembled, processed and manipulated by the LB. Such L7-LBs are typically implemented as software to ensure they remain flexible with regards to implementation changes and that they can scale to meet demand (See Google’s Maglev Load Balancer). This is important as organisations may want to customize the way packets are consumed and what information they can obtain from an incoming packet. The LB will then output either 1 to N number packets due to the various manipulations happening within the LB. Extra packets could’ve been created to supplement information for the backend server, or the packets could’ve been broken down into smaller packets with enriched metadata. L7-LBs can perform a variety of manipulations such as HTTP header manipulation.
Once processed, the L7-LB establishes a TCP connection with the backend and transfers the data accordingly.
Within the L7-LB, the packet is inspected, although this can be a costly process in terms of latency, it can allow the L7-LB to have additional features like load balance based on content or URL Mapping. This can be extremely useful in situations where you want to optimize where traffic is sent based on content. For example, your company has a pool of backends that have been fitted with some high-power GPUs optimized for video processing. Another pool may contain low-power CPUs that are optimized for simple text browsing. The L7-LB can use the URL path e.g.
mycompany.com/upload/videos/to identify the most appropriate backend to send incoming HTTP traffic to i.e the ones with GPUs, whereas requests to a different URL such as
mycompany.com/blogrequests can be transferred to the low-power instances, all thanks to the L7-LBs ability to intelligently split traffic. Google Cloud Platform (GCP) LB capabilities have this.
One important note about L7-LBs is their ability to terminate SSL traffic. This is a pitfall for most L4-LBs as they cannot determine if incoming packets are wrapped in SSL and therefore fail to terminate SSL traffic. L7 Load Balancers therefore can have CA certificates installed within them that can verify the authenticity of your service. This is beneficial as it relieves backends from having to store multiple CA files and handling them. The processing strain from having to encrypt and decrypt such requests is pushed onto L7-LBs to decrypt such data and re-encrypt the packet for transmission to the backend server. This is often an expensive problem (in terms of latency and compute), and can at times be problematic
Session Affinity/Packet Stickiness
One typical characteristic of L7-LBs is the fact of session affinity or connection stickiness. Session Affinity is the tendency for a connection from a given source to continue to be served from a specified backend. So if your IP is 18.104.22.168 and you connect to YouTube servers, that are proxied by L7-LBs, there is a high chance your tutorial on how to get bigger biceps is being served by the exact same server even if you switch to another video. In most cases this is beneficial as you receive an uninterrupted consistent connection, which improves quality of service. However, in cases where there is a malfunction in the backend server and the L7-LB does not pick this up, stickiness will maintain a dead connection to that malfunctioned backend.
Logging & Monitoring
Because of all the information made available to the L7-LBs through packet inspection etc, L7-LBs are the best place to place logging, monitoring and tracing information with regards to networking information. L7-LBs can output useful log files that can help assess the character of network connections flowing through your infrastructure. All this however comes at the price of some extra processing.
HTTP(S) Load Balancing can be achieved using the following network technologies, HAproxy or Nginx.
TCP (Layer 4) Load Balancing
Layer 4 or TCP/UDP Load Balancing (abbr. L4-LB) balances traffic at the packet/segment level and allow for an almost 1-to-1 relationship with regards to packet going into the LB and packet being output from the LB. There are two kinds of L4-LBs
1. L4 Pass Through LB: Given a TCP connection from a source e.g. your computer, “pass through L4-LBs” do not terminate the TCP connection at the LB but rather allow the connection to persist to the backend using connection tracking (a concept I shall elaborate in a different article). This prevents them from storing large amounts of state and can be performant with regards to handling more connections. Matt Klein talks about how L4 Pass Through LBs are not subject to TCP Congestion Control which tries to limit bandwidth of connections based on the buffer size of other live connections in the network.
2. L4 Termination LB: Given a typical TCP connection, when a SYN is sent from a source i.e your computer to the L4 termination LB, the L4-LB terminates this by sending an ACK back to the source i.e acknowledging the SYN. The L4-LB then establishes a new TCP connection to the backend server, and sends the packet to the appropriate backend. These kind of LBs are simple to implement, and have significant performance gains when placed close to the end-user especially users using faulty connections as this LB can act as a point of presence. See Matt Kleins article on Load Balancing
- When a packet crosses an L4-LB, little to no packet manipulation occurs within the LB, therefore the output is almost identical to the input (hence the 1-to-1 relationship, see Figure 3). This allows packets that go through an L4-LB to be easily traced on both sides using packet sniffers.
- Another advantage of not manipulating the packets as they go through the L4-LB is that it allows their implementation to be done with hardware, mostly application specific integrated controllers or ASICs. This allows for throughput to be extremely fast and follow line-rate of speeds similar to a network router doing equal cost multi-path routing or ECMP.
- One important point to note about L4-LBs is their inability to decode packet information that contains any Session (Layer 5), Presentation (Layer 6) and Application Layer (Layer 7) metadata. As a result, it cannot distinguish between for instance a HTTP or HTTPS, or HTTP and FTP or what format the content is in e.g. ASCII, MPEG etc, as these are Layer 6/Layer 7 protocols. In order to achieve this, an L7-LB is required.
Most implementations of L4-LBs are stateless (can be stateful in certain contexts).
Load Balancing throughout the years has evolved into an intelligent way of splitting traffic to backends. One of the ways this is achieved is through the use of health checks.
A health check is a means of probing a backend service to ensure that it is working as expected. This last part is quite broad, and as a result health checks are very diverse in nature. Health Checks are performed by both L4 and L7 LBs and
Cyril Bonte from HAProxy gives the example, if you have a L7-LB that performs a health check in the form of a ping to its server backends, that would be ideal to check if the server can respond to the incoming request. However, this has no bearing on whether the application running on the server has actually crashed as it may be listening to its correct port, but cannot serve data back. On the other hand a health check in the form of a connection check would sort this out. Health checking even goes further as to probe if connected backends are still able to connect to other required services such as databases, or external APIs. The design of health checks should be simple but effective. They should not add significant load to the already busy backends.
Scaling Load Balancers
In some situations, where traffic to an organisation is very high, using single load balancing mechanisms may get overwhelming and taxing on your devices. One common architecture is to lay out LBs with traffic being routed from L4-LBs to L7-LBs to your backends, almost in a tree/tiered structure as shown in Figure below.
This strategy allows organisations to use L4-LBs as proxies that can shuffle incoming traffic to L7-LBs sitting downstream. Matt Klein who wrote a more in-depth article on load balancing suggests that:
- Since L7-LBs tend to do more intricate computation on packets i.e assembly, analysis, logging etc, they can benefit from having traffic split among them through a L4-LB upstream. In addition, since L7-LBs tend to be upgraded/updated often, they are often taken down for maintenance, and would not be appropriate to be placed right after the global router. This architecture, enables graceful fail over for the L7-LBs.
- Placing L4-LBs in the forefront also allows them to better handle DDoS attacks, generic packet, or any other form of packet flooding than L7-LBs. They also see less downtime as their simple hardware implementations do not need to be upgraded often.
Load Balancing can be done using several different methods, primarily Layer 4 — Transport Layer (TCP, UDP) and Layer 7 — Application Layer (HTTP, FTP etc). Both these type of Load Balancers have their own use cases as shown above. L4-LBs act almost as transport layer aware routers that do no packet manipulation and are faster than L7-LBs that perform an array of manipulation to packets, and have session affinity ensuring connections that result from the same user are always served from the same backend service. L7-LBs are more common and are often always software whereas L4-LBs are less common, and tend to be implemented in dedicated hardware.
One final note regarding performance, although L7-LBs do more resource intensive work, on modern servers, throughput is still considerably fast. See this blog by NginX.
In the next article we look at some Load Balancing Strategies to see how Load Balancers decide to send packets to the appropriate backend.
Once again, Thanks for reading!