Writing a HTTP Load Balancer in Python using TDD: Routing

Published in

Load Balancer Series

8 min readMay 29, 2020

This is the second of a 8 part tutorial series where we will explore a critical part of network infrastructure. In this tutorial we are going to implement a Load Balancer in Python using TDD (Test Driven Development). This tutorial is aimed at beginners and will go into detail on networking concepts and their business use cases.

Navigation

Update December 2020

This whole Load Balancer series is now available at testdriven.io! Please follow the link here: https://testdriven.io/courses/http-load-balancer/

What to expect in Part 2 “Routing”

Part 2 will cover the basic concepts of a Load Balancer which is routing. There are two types of routing that Load Balancers can use, host based and path based. There are more ways of routing but we will only focus these two and implement them in our tutorial.

Vocabulary

This section will define a few key words that are common through out this tutorial. Idea is to reference this section as we discuss the various networking concepts.

Routing — A way to send requests to different collection of servers.

CDN — Content Distribution Network used to serve html, css, images, javascript etc closer to the clients.

FQDN — Fully Qualified Domain Name.

Backend Server/s — Server or Servers behind a load balancer.

Docker — Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.

Docker-compose — Compose is a tool for defining and running multi-container Docker applications.

requests — Python library to send HTTP requests, HTTP for humans

Why do we need routing?

Routing is a general concept in networking which refers to the path a packet takes to reach a destination. In our case when we implement routing in a load balancer we are referring to how HTTP requests reach other collections of servers (or different backend servers). This is essential to figure out because multiple routing strategies can be used to achieve different outcomes based on your scenario.

What is Host based routing?

Host based routing observes the “Host” header in the HTTP request and directs the request based on that header. If the header is not found then the load balancer returns a 404 to the client.

In the above example the client on the left sends their Host header as “www.mango.com” and the Load Balancer directs them to the mango backend servers. The client on the left sends “www.notmango.com” and gets a “404 Not Found” back because our Load Balancer only recognises “www.mango.com” and “www.apple.com”.

So how does one set the Host header? There are two ways, explicit and implicit. The explicit way is to simply pass it in your HTTP client, for example:

curl -H 'Host: www.google.com' 172.217.169.68

The implicit way is:

curl www.google.com

In the implicit case curl will set the host header for you as it resolves the DNS. For the explicit case we have to pass the header ourselves otherwise we get a redirect 301 code. Google is a slightly more complicated case that it has a CDN (Content Distribution Network) in front, so we are not hitting the Load Balancer directly.

What is path based routing?

Path based routing relies on the URI to send the request to the backend servers. Let’s look at an example:

https://www.mywebsite.com/apples
https://www.mywebsite.com/mangoes

Let’s break down the above, https is the protocol, www.mywebsite.com is the FQDN (Fully Qualified Domain Name) and /apples is the path. If someone requests the /apples path we will direct the request to the apples backend servers and if someone hits the /mangoes path we will direct them to the mangoes backend server.

Path based routing is a very common pattern for creating micro services to break a large web application into a smaller one. As with the above example host based and path based routing complement each other, essentially path based routing relies on host based routing.

Writing our tests for Host Based Routing

Before we start setting the infrastructure up, we firstly need to write our tests.

# test_loadbalancer.py
from loadbalancer import loadbalancerimport pytest@pytest.fixture
def client():
    with loadbalancer.test_client() as client:
        yield clientdef test_host_routing_mango(client):
    result = client.get('/', headers={"Host":"www.mango.com"})
    assert b'This is the mango application.' in result.datadef test_host_routing_apple(client):
    result = client.get('/', headers={"Host":"www.apple.com"})
    assert b'This is the apple application.' in result.datadef test_host_routing_notfound(client):
    result = client.get('/', headers={"Host":"www.notmango.com"})
    assert b'Not Found' in result.data
    assert 404 == result.status_code

I have removed the test_hello test and put in 3 new ones. Notice the last test also looks at the HTTP status code to confirm that we are returning the correct status code to the client. Running the above will result in all 3 tests failing which is what we want.

Setting up multiple backends

In order to test out our Load Balancer routing we need to setup multiple servers for each of our hosts. We are going to use docker-compose to spin up multiple docker containers that will be our backends for each host in our Load Balancer. Since these containers are a pre-cursor for our tests to succeed we are going to create a testing task using make.

Let’s start off by creating a very basic Python web server with Flask.

# app.py
from flask import Flask
import osapp = Flask(__name__)@app.route('/')
def sample():
    return "This is the {} application.".format(os.environ["APP"])if __name__ == '__main__':
    app.run(host="0.0.0.0")

The above application takes an environment variable APP and simply returns it at the root, this way we can give the app any name we want. Let’s define a Docker file.

# Dockerfile
FROM python:3RUN pip install flask
COPY ./app.py /app/app.pyCMD ["python", "/app/app.py"]

Let’s build the docker image.

$ docker build -t server .
Sending build context to Docker daemon  4.096kB
Step 1/5 : FROM python:3
 ---> f88b2f81f83a
Step 2/5 : RUN pip install flask
 ---> Using cache
 ---> 2ffda376ccc2
Step 3/5 : COPY ./app.py /app/app.py
 ---> Using cache
 ---> 1c00c3d76acf
Step 4/5 : CMD ["python", "/app/app.py"]
 ---> Running in e8d61c3bbdf8
Removing intermediate container e8d61c3bbdf8
 ---> 13ccb5a1b21a
Successfully built 13ccb5a1b21a
Successfully tagged server:latest

We use the python 3 base image, install dependencies, copy our script into the container, then run python.

# docker-compose.yaml
version: '3'
services:
  mango1:
    image: server
    environment:
      - APP=mango
    ports:
      - "8081:5000"  mango2:
    image: server
    environment:
      - APP=mango
    ports:
      - "8082:5000"  apple1:
    image: server
    environment:
      - APP=apple
    ports:
      - "9081:5000"  apple2:
    image: server
    environment:
      - APP=apple
    ports:
      - "9082:5000"

Now we can run:

$ docker-compose up -d
Creating network "part2_default" with the default driver
Creating part2_apple2_1 ... done
Creating part2_mango2_1 ... done
Creating part2_apple1_1 ... done
Creating part2_mango1_1 ... done

If you get errors in the above step you might have something else running on the 8081, 8082, 9081 and 9082 ports. In this case just change the port numbers to the ones that are free on your Operating System.

We can test our applications if we hit the relevant port:

$ curl localhost:8081
This is the mango application.$ curl localhost:8082
This is the mango application.$ curl localhost:9081
This is the apple application.$ curl localhost:9082
This is the apple application.

Now we are going to create a make task to help us automate spinning these containers up using docker-compose, then tear them down after we are done. To tear down the containers run docker-compose down. This should be done before we run the next step.

# Makefile
test:
        docker-compose up -d
        pytest --disable-warnings || true
        docker-compose down

Note on the above, Makefiles use tabs, I recommend to grab the Makefile from the source code shared at the end of this tutorial. Also note that I have added a pytest --disable-warnings || true this means that what ever exit code pytest gives ignore it and teardown our containers. We want our tests to not leave anything behind so we can run them again without any manual intervention.

We can now run make test from where the Makefile is and it should give a similar output to this:

$ make test
Creating network "part2_default" with the default driver
Creating part2_apple2_1 ... done
Creating part2_mango2_1 ... done
Creating part2_apple1_1 ... done
Creating part2_mango1_1 ... done
pytest || true
========================================================================= test session starts =========================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/neeran/Code/loadbalancer-series/python-loadbalancer/part2
plugins: requests-mock-1.7.0
collected 3 itemstest_loadbalancer.py FFF
....
<truncated>docker-compose down
Stopping part2_apple1_1 ... done
Stopping part2_mango2_1 ... done
Stopping part2_apple2_1 ... done
Stopping part2_mango1_1 ... done
Removing part2_apple1_1 ... done
Removing part2_mango2_1 ... done
Removing part2_apple2_1 ... done
Removing part2_mango1_1 ... done
Removing network part2_default

Our tests are still failing but at least we are now tearing down after they fail. Now it is time to implement host based routing.

Implementing Host Based Routing

Time to make our tests pass. We will be using the requests library to send the HTTP request to the backend server. We need to install it first.

pip install requests

Now we can update our load balancer application.

# loadbalancer.py
from flask import Flask, request
import requests, randomloadbalancer = Flask(__name__)MANGO_BACKENDS = ["localhost:8081", "localhost:8082"]
APPLE_BACKENDS = ["localhost:9081", "localhost:9082"]@loadbalancer.route('/')
def router():
    host_header = request.headers["Host"]
    if host_header == "www.mango.com":
        response = requests.get("http://{}".format(random.choice(MANGO_BACKENDS)))
        return response.content, response.status_code
    elif host_header == "www.apple.com":
        response = requests.get("http://{}".format(random.choice(APPLE_BACKENDS)))
        return response.content, response.status_code
    else:
        return "Not Found", 404

We firstly define our backend servers for our hosts which are based on the docker-compose file we generated earlier. Then we fetch the Host header and do an if condition to check if it matches our criteria. If we get a match then choose a backend server at random and send a HTTP request and return the HTTP response body and status code back to the client. If we don’t get any match then we return “Not Found” and the 404 status code. Let’s run our tests.

$ make test
docker-compose up -d
Creating network "part2_default" with the default driver
Creating part2_apple2_1 ... done
Creating part2_apple1_1 ... done
Creating part2_mango2_1 ... done
Creating part2_mango1_1 ... done
pytest || true
========================================================================= test session starts =========================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/neeran/Code/loadbalancer-series/python-loadbalancer/part2
plugins: requests-mock-1.7.0
collected 3 itemstest_loadbalancer.py ...                                                                                                                                        [100%]
==================================================================== 3 passed, 2 warnings in 0.14s ====================================================================
docker-compose down
Stopping part2_mango2_1 ... done
Stopping part2_apple2_1 ... done
Stopping part2_mango1_1 ... done
Stopping part2_apple1_1 ... done
Removing part2_mango2_1 ... done
Removing part2_apple2_1 ... done
Removing part2_mango1_1 ... done
Removing part2_apple1_1 ... done
Removing network part2_default

We have successfully implemented Host based routing. Now let’s do path based routing.

Writing our tests for Path Based Routing

# test_loadbalancer.py
...def test_path_routing_mango(client):
    result = client.get('/mango')
    assert b'This is the mango application.' in result.datadef test_path_routing_apple(client):
    result = client.get('/apple')
    assert b'This is the apple application.' in result.datadef test_path_routing_notfound(client):
    result = client.get('/notmango')
    assert b'Not Found' in result.data
    assert 404 == result.status_code

We don’t care about the host header in this case.

Implementing Path Based Routing

# test_loadbalancer.py
...@loadbalancer.route('/mango')
def mango_path():
    response = requests.get("http://{}".format(random.choice(MANGO_BACKENDS)))
    return response.content, response.status_code@loadbalancer.route('/apple')
def apple_path():
    response = requests.get("http://{}".format(random.choice(APPLE_BACKENDS)))
    return response.content, response.status_code

With the above our make test command will pass all 6 tests!

Let’s wrap up here for Part 2. We have gone through the what host and path based routing is, how to write tests for both cases and how to implement them. Whilst setting up the tests we used docker-compose to spin up multiple backend servers for hosts supported by our Load Balancer.

The full source code for this part can be found here: https://github.com/paktek123/python-loadbalancer/tree/master/part2

See you in Part 3 at https://testdriven.io/courses/http-load-balancer/.