Writing a HTTP Load Balancer in Python using TDD: Configuration

Published in

Load Balancer Series

4 min readJun 5, 2020

This is the third of a 8 part tutorial series where we will explore a critical part of network infrastructure. In this tutorial we are going to implement a Load Balancer in Python using TDD (Test Driven Development). This tutorial is aimed at beginners and will go into detail on networking concepts and their business use cases.

Navigation

Update December 2020

This whole Load Balancer series is now available at testdriven.io! Please follow the link here: https://testdriven.io/courses/http-load-balancer/

What to expect in Part 3 “Configuration”

Part 3 will cover the implementation of a configuration file for our Load Balancer. We will explore why there is a need for configuration files and what problems they help to solve. Towards the end we will also see why our TDD approach helps us move faster.

Vocabulary

This section will define a few key words that are common through out this tutorial. Idea is to reference this section as we discuss the various networking concepts.

NGINX — A highly performant and versatile load balancer.

YAML — YAML Ain’t Markup Language.

URL — Uniform Resource Locator.

Why do we need configuration files?

Configuration files are very common for all sorts of applications whether they are exposed via GUIs or modified by hand. They give us the ability to tweak applications for different needs and allows us to change the behaviour for different environments. We can also keep our configuration out of our code which results in less code changes.

Introducing change

Let’s review the situation, our Load Balancer is working as expected and now we are about to introduce a “major” change. This is where our tests will play a big part, we can verify that after we implement this change that our tests still pass. If they still pass we can be reasonably confident that we will not break anything or change the behaviour of our application once we deploy in production.

Implementing a configuration file for host based routing

Instead of hardcoding the backend servers for each host we will define them in our configuration file as follows:

MANGO_BACKENDS = ["localhost:8081", "localhost:8082"]
APPLE_BACKENDS = ["localhost:9081", "localhost:9082"]

In our configuration file:

# loadbalancer.yaml
hosts:
  - host: www.mango.com
    servers:
      - localhost:8081
      - localhost:8082
  - host: www.apple.com
    servers:
      - localhost:9081
      - localhost:9082

Let’s implement the above in our Load Balancer code. We firstly need to install pyyaml.

pip install pyyaml

Now to modify our Load Balancer application.

# loadbalancer.py
from flask import Flask, request
import requests, random
import yamlloadbalancer = Flask(__name__)def load_configuration(path):
    with open(path) as config_file:
        config = yaml.load(config_file, Loader=yaml.FullLoader)
    return configconfig = load_configuration('loadbalancer.yaml')@loadbalancer.route('/')
def router():
    host_header = request.headers["Host"]
    for entry in config["hosts"]:
        if host_header == entry["host"]:
            response = requests.get("http://{}".format(random.choice(entry["servers"])))
            return response.content, response.status_code    return "Not Found", 404...

We import the yaml module and define our load_configuration function that loads our loadbalancer.yaml into a Python dictionary. The router function iterates through the hosts list defined in our configuration file and checks if the host key matches the Host header. If there is a match we send a request to the backend servers defined in the servers otherwise we return a 404.

We can now run our tests and we will find that 4 out of 6 pass. The other 2 fail on the path based routing. Let’s implement this in the next section.

Implementing a configuration file for path based routing

Our YAML file can look as follows:

# loadbalancer.yaml
hosts:
  - host: www.mango.com
    servers:
      - localhost:8081
      - localhost:8082
  - host: www.apple.com
    servers:
      - localhost:9081
      - localhost:9082paths:
  - path: /mango
    servers:
      - localhost:8081
      - localhost:8082
  - path: /apple
    servers:
      - localhost:9081
      - localhost:9082

Implementing path based configuration in our Load Balancer application.

# loadbalancer.py
...@loadbalancer.route("/<path>")
def path_router(path):
    for entry in config["paths"]:
        if ("/" + path) == entry["path"]:
            response = requests.get("http://{}".format(random.choice(entry["servers"])))
            return response.content, response.status_code    return "Not Found", 404...

We iterate through the paths key and match it with the path given in the URL, then send a request to the matching backend. If nothing is matched then we return a 404. We can now run make test and all our tests will be passing.

Let’s wrap up here for Part 3. We implemented a configuration file for our load balancer application and covered reasons for why adding a configuration file makes us more flexible. Taking a TDD approach paid off for us because we were able implement this change and knowing that the behaviour of our application did not change.

The full source code for this part can be found here: https://github.com/paktek123/python-loadbalancer/tree/master/part3

See you in Part 4 at https://testdriven.io/courses/http-load-balancer/.