Writing a HTTP Load Balancer in Python using TDD: Theoretical Concepts

Neeran Gul
Load Balancer Series
7 min readMay 22, 2020
Image by Alexandra_Koch from Pixabay

This is the first of a 8 part tutorial series where we will explore a critical part of network infrastructure. In this tutorial we are going to implement a Load Balancer in Python using TDD (Test Driven Development). This tutorial is aimed at beginners and will go into detail on networking concepts and their business use cases.

Navigation

  1. Theoretical Concepts
  2. Routing
  3. Configuration
  4. Health checks
  5. Manipulating the HTTP request
  6. Load Balancing Algorithms
  7. Intelligent Firewall
  8. Open Source Load Balancers

Update December 2020

This whole Load Balancer series is now available at testdriven.io! Please follow the link here: https://testdriven.io/courses/http-load-balancer/

What to expect in Part 1 “Theoretical Concepts”

Part 1 will mainly cover the theoretical concepts of a Load Balancer, why we need them and their use cases. In the end we will end up with an understanding of what a Load Balancer is and a skeleton HTTP server with tests. We will also discuss what TDD is and what is the exact problem we are trying to solve with the relevant tools.

Vocabulary

This section will define a few key words that are common through out this tutorial. Idea is to reference this section as we discuss the various networking concepts.

HTTP — Stands for Hyper Text Transport Protocol, is a protocol on the application layer that allows to fetch resources from the Internet such as HTML.

HTTPS — Stands for Hyper Text Transport Protocol Secure, an extension of HTTP, adds secure communication via TLS.

Server — A computer running either in your or someone else’s possession or in a DC.

Proxy — An application that carries out an operation or task on behalf of another server.

Latency — Networking latency is a delay before data is transferred between the server and client.

Flask Python Micro Web Framework.

PytestPython Testing Framework.

Virtualenv A Tool for creating isolated Python environments.

MakeA tool to generate executable and non-source files.

What is a Load Balancer?

A Load Balancer is a networking component for distributing load across multiple servers used to horizontally scale web based applications. There are many popular ones out there such as NGINX, HAProxy, Traefik to name a few. We will touch on Open Source Load Balancers in Part 8 of this series. Why do Load Balancers play a big part in networking infrastructure? Because they allow engineers to scale and improve reliability of web applications. Let’s go over an example.

As users visit our imaginary website www.mango.com during peak time one server can struggle to keep latency low and stay available. This largely depends on the application itself, let’s say it is a website that people purchase mangos from, that server has a finite amount of memory and CPU. As traffic increases the single server starts to struggle. In order to cope with the additional traffic we can add two more servers and front them with a Load Balancer.

With the above architecture the load is distributed across the servers and we can keep scaling by adding more servers or even start splitting functionality by adding micro services into the mix. In practice the Load Balancer acts like a proxy fronting the underlying servers (or backends). As we continue working on the tutorial we will go through more concepts related to Load Balancers and their usage.

What is TDD?

TDD stands for Test Driven Development, in short this means that write your tests first then write the code. The benefit of this approach is that tests are not an after thought but part of the thought process. I will not go into full detail on the benefits of this approach, if interested please have a read here and here (for a more holistic look at different test approaches). Why use TDD here? Mainly to encourage good code quality, I personally really like the fact that the tests can be used as documentation and describe the behaviour of the application in great detail.

Approaching the problem

A HTTP Load Balancer is a HTTP proxy server, a proxy handles HTTP requests on behalf of other servers. In this tutorial we will use Flask, a micro web framework and pytest to develop and test our Load Balancer functionality. We will use make to define basic tasks and help us define basic operations. In order to test our Load Balancer end to end we will also spin up docker containers via docker compose.

Setting up the environment

Firstly we need to install our dependencies, these are:

  • python3 and pip (or pip3)
  • virtualenv
  • pytest
  • flask

For dependency management I will use virtualenv (a great way to manage Python dependencies), if you are confident then go for Pipenv by following this tutorial: https://docs.python-guide.org/dev/virtualenvs/

Open up a bash shell and enter the following commands:

# install python3 which usually comes with pip, get pip here: https://pip.pypa.io/en/stable/installing$ pip install virtualenv # install virtualenv
$ virtualenv loadbalancer # create the virtualenv
$ source loadbalancer/bin/activate # activate virtual environment
$ pip install flask pytest
$ mkdir loadbalancer_tutorial
$ cd loadbalancer_tutorial

With the above commands, our dependencies are installed. Let’s write our first test. Create a file called test_loadbalancer.py:

from loadbalancer import loadbalancer import pytest @pytest.fixture
def client():
with loadbalancer.test_client() as client:
yield client
def test_hello(client):
result = client.get('/')
assert b'hello' in result.data

In the above code we import the loadbalancer application from our loadbalancer module (which doesn’t exist yet). We then import the pytest module. The fixture function defines a test client for us to use. This allows us to setup a test client that we define once and can use in the following functions. In the pytest framework every function (and module/file) that starts with test_ is identified as a test. We write a test_hello function that hits the root of our web application and expects the text hello to be returned.

To run this simply go into the loadbalancer_tutorial and run the following:

$ pytest========================================================================= test session starts =========================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/neeran/Code/loadbalancer-series/python-loadbalancer/part1
plugins: requests-mock-1.7.0
collected 0 items / 1 error
=============================================================================== ERRORS ================================================================================
________________________________________________________________ ERROR collecting test_loadbalancer.py ________________________________________________________________
ImportError while importing test module '/Users/neeran/Code/loadbalancer-series/python-loadbalancer/part1/test_loadbalancer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test_loadbalancer.py:1: in <module>
from loadbalancer import loadbalancer
E ModuleNotFoundError: No module named 'loadbalancer'
======================================================================= short test summary info =======================================================================
ERROR test_loadbalancer.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================================== 1 error in 0.12s ===========================================================================

As we can see there is a big fat error! This is the spirit of TDD. Let’s fix the error. It’s complaining about the loadbalancer module that doesn’t exist. We are going to create one as follows, create a file called loadbalancer.py in the loadbalancer_tutorial directory:

from flask import Flaskloadbalancer = Flask(__name__)

Let’s run our tests again:

$ pytest --disable-warnings========================================================================= test session starts =========================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/neeran/Code/loadbalancer-series/python-loadbalancer/part1
plugins: requests-mock-1.7.0
collected 1 item
test_loadbalancer.py F [100%]============================================================================== FAILURES ===============================================================================
_____________________________________________________________________________ test_hello ______________________________________________________________________________
client = <FlaskClient <Flask 'loadbalancer'>>def test_hello(client):
result = client.get('/')
> assert b'hello' in result.data
E assert b'hello' in b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n'
E + where b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n' = <Response 232 bytes [404 NOT FOUND]>.data
test_loadbalancer.py:12: AssertionError
======================================================================= short test summary info =======================================================================
FAILED test_loadbalancer.py::test_hello - assert b'hello' in b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>404 Not Found</title>\n<h1>Not Found<...
==================================================================== 1 failed, 2 warnings in 0.28s ====================================================================

I have added an additional flag to suppress warnings but now we get a 404 Not Found. This essentially means we do not have a / url defined for our Load Balancer app. Let’s add it:

from flask import Flask loadbalancer = Flask(__name__) @loadbalancer.route('/')
def router():
return "hello"

Now we run our tests:

$ pytest --disable-warnings========================================================================= test session starts =========================================================================
platform darwin -- Python 3.7.3, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/neeran/Code/loadbalancer-series/python-loadbalancer/part1
plugins: requests-mock-1.7.0
collected 1 item
test_loadbalancer.py . [100%]==================================================================== 1 passed, 2 warnings in 0.18s ====================================================================

Hooray! We have made our first test pass!

Let’s wrap up here for Part 1. We have gone through the theoretical concepts of why we need a Load Balancer and why we are using TDD to write one, we also laid down some barebones of our testing framework which we will use going forward to ensure code quality for our Load Balancer.

The full source code for this part can be found here: https://github.com/paktek123/python-loadbalancer/tree/master/part1

See you in Part 2 at at https://testdriven.io/courses/http-load-balancer/.

--

--

Neeran Gul
Load Balancer Series

Industry veteran providing strong mentorship and sharing experiences.