Develop a Docker Containerized Python API With Terraform, Gitlab, Kubernetes, and AWS
A step-by-step guide to learning more about these platforms
Hello, internet! I participated in a DevOps assignment for an interview some months ago, and I decided to publish the solution I implemented, which led to me passing the technical step.
Even though the opportunity wasn’t aligned to my needs in the end, the assignment was pretty interesting and may be a good starting point for junior DevOps that are starting to learn the tools of the job :).
Technology Stack
The following stack is often used in the IT industry. The involved technologies are:
- CI/CD: Gitlab CI
- Coding: Python
- Cloud Provider: AWS
- IAAC: Terraform
- Microservices: Kubernetes
The Problem
Develop a web server that exposes the following endpoints on port 4545
:
APIs
/api/health
returns a JSON payload containing {“status”: “ok”}
/api/mirror?word={word}
returns a JSON payload containing the transformed input word as follows:
- Lowercase letters must be transformed into uppercase letters.
- Uppercase letters must be transformed into lowercase letters.
- Any digit must be inverted with its complementary (9 becomes 0, 8 becomes 1, 7 becomes 2, … 0 becomes 9).
- Any other character should be left as it is.
- Reverse the
w34hole
string(luca <-> acul)
; For example,/api/mirror?word=F0oBar25 returns {“transformed”: “47RAb0oF”}
.
Develop a simple test case to test the example just mentioned above.
CI/CD
You must write pipelines to run application tests (only on the main branch), build a docker image of the application and push it to a private registry.
Kubernetes
Deploy the application on a K8s cluster using Terraform. The deliverable should do the following:
- Use the docker image you pushed to the registry.
- Provide an Ingress that listens on
port 80
and redirects traffic to the app.
Terraform
Define the infrastructure to upload files from the application to an S3 Bucket.
Add an endpoint to your application that listens for POST
calls on /api/upload-random
that creates a .txt file with a random number as content and uploads it on the S3 Bucket.
My Solution
First of all, the problem at first may seem long and complicated, but it’s actually pretty easy if you can break it down into small sub-problems to solve. In the end, it’s what a good software engineer should always be able to do to solve even the hardest problems.
So, the proposed architecture is the following:
In all fairness, this is a very simple architecture. It’s divided into three parts:
On the left: the clients make their requests via APIs defined by requirements. The APIs lands on the Ingress component of Kubernetes (in this case, the service is of the LoadBalancer
type).
In the middle: the service will connect the “outside” to the “inside,” which is represented by the Kubernetes Deployment, in which the Python Server Dockerized lies.
On the right: last but not least, the staging bucket will be used to upload the random_upload
files generated by the API and the state of our Terraform code.
Note: I will not explain every single line of code. This small architecture is intended for people who already know a little about Python, Terraform, Docker, and Kubernetes. Not much, just a little. Be warned. The best way to understand it, besides reading this article, is to clone the repo and explore it yourself.
Repositories
Keep the repositories separate: one for the application code and one for the infrastructure code. Among all the benefits, you will be able to keep the CI/CD jobs in two different scopes.
So, one will be application-code-repo, the other is infrastructure-code-repo.
The local Kubernetes cluster
Among all the solutions for spinning up a Kubernetes cluster locally, like Kind or MiniKube, I would simplify this project's scope with a simple Docker Desktop. In my case, for Mac M1, the software does the job just fine. To enable it, go to Docker Desktop’s settings and toggle the Kubernetes option.
With that done, you’re ready to go. You have a K8S cluster working locally.
API: Python App Server
Globally, there are three APIs, shown below:
/api/health
simply returns the status of the services. It either returns “ok” or times out, so there is nothing to really explain here.
@api.route('/api/health', methods=['GET'])
def get_health():
return {"status": "ok"}
/api/mirror
transforms the word using the transformation mentioned before. The transformations are in a separate file called string_transform.py
. The test case (the only one, lol) makes sure the transformation functions are regularly working.
@api.route('/api/mirror', methods=['GET'])
def get_mirror():
word = request.args.get("word")
return {"transformed": string_transform.transform(word)}
/api/random_uploads
is by far the most interesting. It creates a random number between a defined min and max, and then it creates a file with that name and uploads it using boto3 on the bucket.
@api.route('/api/upload-random', methods=['POST'])
def post_upload_random():
min = 0
max = 9999
random_number = str(random.randint(min, max))
filename = f'{random_number}.txt' file.create_file(filename, random_number)
s3.upload_file(file_name=filename, bucket='oper-qual-staging')
return {"uploaded": "ok"}
Dockerfile
To build the docker image, the code is very straightforward. It simply installs the requirements on the target image and executes the python flask server in the container. Here’s the code:
FROM python:3.8-slim-busterWORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txtCOPY src .CMD ["python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=4545"]
Terraform
provider.tf
The provider file sets up the Terraform providers used for the project. In this case, the AWS and Kubernetes providers.
For AWS, S3 will hold the Terraform state so that we don’t store it in our local machine. Alternatively, you can store it in Gitlab as well.
For Kubernetes, you must insert your host and token to access the docker cluster.
main.tf
modules/deployment
The deployment part is by far the most interesting part of the Terraform project since it’s the juice of the infra-as-a-code itself.
The deployment will point at the docker image of the Python webserver pushed to the Gitlab Registry. You also need to specify a container_port
, including the liveness probe, to check if the cluster is healthy. Why waste the /api/health
API if it’s already there?
This file will also hold the Kubernetes secret, which is necessary to establish the connection to the registry mentioned above.
The code is quite long, so click to explore it.
modules/service
The LoadBalancer
is necessary to expose the Kubernetes pod. Without it, it would be impossible to reach the service. Besides that, it’s important to take care of mapping the external:internal
port, so you know which port you must use for the service. Here’s the code:
Cloud Provider: AWS
AWS is my favorite cloud provider so far. In this project, S3 is going to be the selected service for two main purposes:
- A bucket to contain the Terraform state of the project so it’s not stored locally
- A bucket to store the files randomly generated by the
/api/upload-random
.
Access credentials and permissions must be created. A user with simple S3FullAccess permission is completely fine for this project's scope. Download the AWS credentials and set them up in the CI/CD variables.
Gitlab CI
Gitlab offers the possibility of using shared runners that run CI/CD Pipelines for you for free. You don’t have to install a gitlab-runner on your machine (perhaps, a Raspberry Pi 4 is a cheap good fit).
After enabling the runner from Gitlab Settings for both projects, you must set up CI/CD variables that are environment variables used by your runner during their tasks.
The ones I’ve configured are AWS_ACCESS_KEY_ID
, AWS_DEFAULT_REGION
, AWS_SECRET_ACCESS_KEY
, CI_REGISTRY_EMAIL
, CI_REGISTRY_PASS
, CI_REGISTRY_USER
, and SVC_TERRAFORM_K8S
.
Application Pipeline
The Application Pipeline, located under the Application folder, has two phases: test and build.
- The test will test the python code running the unit test.
- If tests completed successfully, it triggers a build phase (only on master branch) in which a docker image is pushed to the private gitlab registry.
Note: the build phase works only in the master branch because the requirement asks that an image it’s pushed only on a master branch.
Infrastructure Pipeline
The Infrastructure Pipeline is divided in three phases: validate, plan, and apply.
- The validate phase checks if the terraform code is valid.
- If so, it triggers a plan phase in which a tf plan is created.
- A final manual apply phase to deploy the infrastructure automatically.
Note: the apply phase works only in the master branch because the code will be deployed only in one “production” environment.
Reaching the Service
DNS Records
To make it more interesting, set up a DNS record. If you own a domain, point the DNS record directly at the cluster, i.e., kubernetes.lucacesarano.com
as an endpoint for the cluster and api.lucacesarano.com
for reaching the APIs.
If you want to reach it from outside your network, you need to make your service available. An easy and common way is to use a Reverse Proxy. I do not have a guide for that, but you may want to spin up this docker container for doing it.
With a Reverse Proxy, you will expose only one port on one machine and then map routes to any service you want inside your home network. The easiest by far is NGinx Proxy Manager.
Postman
Use Postman to test your API. Setup your endpoint and your APIs created before and see if everything is working as intended.
Want to connect with the author?You know the gist. Send me a message for any clarification, and let’s keep in touch for any possibilities. My contacts are available on my website. See ya!