Configuring GitLab CI/CD with AWS CodeDeploy and Docker for Python Application

An In-Depth guide on automating your code deployment journey from GitLab to AWS with AWS CodeDeploy.

Avi Khandelwal
DATA PEACE AI
13 min readSep 3, 2020

--

Have you ever thought about how the software developers continuously add or fix new features to the code and deploy the changes made as quickly as possible? When writing code, the developer team makes dozens of changes every single day and they ensure that nothing is broken. That’s where CI/CD comes into play.

In this blog post, I will show you how you can configure your .gitlab-ci.yml file to continuously integrate and deploy a Python-Django application to AWS EC2 with the help of AWS CodeDeploy service. Also to add an extra cream you will learn how to build docker images with GitLab CI/CD for your Python-Django application and push the image to the GitLab Container Registry.

Note: This post is not intended to show how to dockerize a Python-Django application. For this please check out my other blog on Dockerizing Django Application with PM2 and Nginx.

Now too much-talking let’s get straight into the core concepts of continuous integration.

Why Continuous Integration?

Continuous integration is a software development practice of frequently integrating the code changes from multiple contributors into a shared repository preferably several times a day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible and later deploy the changes made.

Some of the benefits that continuous integration will bring to your organization are:

  1. Quickly fixing out the bug every time a new change is made.
  2. Reduce time in digging for bugs and spending more time writing awesome code.
  3. Speeding things with automated tests and build to deliver your software more quickly.
  4. Increase team transparency and collaboration.
  5. Faster deployments of new features to the majority of cloud providers.

Prerequisites

Since this post shows how to configure your .gitlab-ci.yml file for CI/CD of your Python-Django application, it is assumed that you have:

  1. Created an account on GitLab.
  2. Installed and configured your GitLab runner (however we are going to make use of GitLab shared runners).
  3. An existing AWS account with its credentials stored securely.
  4. AWS EC2 instance as the target for each of the environments namely, development, staging, and production.
  5. AWS S3 bucket for storing the application revisions and to use it with AWS CodeDeploy.
  6. AWS CodeDeploy application and group with deployment target as the EC2 instance (usually the recommendation is to create a CodeDeploy application with three groups for each environment such as development, staging, and production followed by an appropriate naming convention). For more information on getting started with CodeDeploy, click here.

and that’s it, this is all you need to set up GitLab CI/CD pipeline.

The .gitlab-ci.yml file

GitLab has a way to implement CI/CD for your project maintained in it. This can be achieved with the help of .gitlab-ci.yml file which is a YAML file that you need to create at the project’s root. This file defines the structure and the order of the pipelines. This also determines what action to take for a process when specific conditions are met.

This file automatically gets triggered whenever you push a commit to the GitLab depending on its configurations defined on which specific branch or tags to run. This feature allows the implementation of CI/CD easier in your organization where multiple developers are working on the same project and you can control when to trigger the CI/CD pipeline depending upon your use case.

Create a .gitlab-ci.yml file at your project’s root with the following content:

It seems there’s a lot going on here. Don’t freak out. Let me break down the snippet:

default:
image:
name: python:3.8

The image keyword is used as the name of the base image that the Docker executor will run to perform CI/CD tasks. The executor will grab the image from Docker Hub according to the specified name of the image. Since we are implementing this pipeline for the Python application we have to use python image. I recommend specifying a particular version for any image you’re going to use instead of the latest.

stages:
- build
- test
- deploy

Stages are the keyword that defines the number of stages in which the CI/CD pipeline will run sequentially till it reaches the final stage. Typically a CI/CD pipeline has three stages namely build, test, and deploy. If any of the stages fails then no further stage is going to run.

Like in the above first build stage is run, then test and at last deploy. If the build fails the test and the deploy stage will not be going to run.

variables:
CI_DEBUG_TRACE: "false"
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"

GitLab CI/CD allows you to set your own custom environment variables so that you can make use of these later in the subsequent jobs to replace the variable name with the values that you have set. The variables are used by the runners anytime the pipeline runs.

CI_DEBUG_TRACE is set to false so that it does not show complete logs to avoid revealing secrets.

DOCKER_DRIVER and DOCKER_TLS_CERTDIR are the variables that are used in building Docker images. Since in this blog we are also focussing on how to build a docker image and push the image to the GitLab registry we need to specify the docker driver which is overlay2. GitLab shared runners use the overlay2 driver by default. Also, we need to specify the path to the certificate that the docker will create automatically on boot.

docker_build:
image: docker:19.03.12
stage: build
services:
- docker:19.03.12-dind
variables:
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
before_script:
- echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
script:
- docker pull $CI_REGISTRY_IMAGE:latest || true
- docker build --cache-from $CI_REGISTRY_IMAGE:latest --tag $IMAGE_TAG --tag $CI_REGISTRY_IMAGE:latest .
- docker push $IMAGE_TAG
- docker push $CI_REGISTRY_IMAGE:latest
rules:
- if: $CI_COMMIT_TAG

This section of the .gitlab-ci.yml file deals with building and pushing our Python-Django application to the GitLab Container Registry.

GitLab Container Registry

The docker_build is the name of the job to be run on the build stage. Since we are using Docker-in-Docker on our runners, we need to specify services with docker:19.03.12-dind image. For authenticating with the container registry with the GitLab CI/CD you need to pass the registry username along with the password. These credentials are passed with the help of variables defined for GitLab CI/CD. The user-specified with this variable is created for you in order to push the image to the registry and its password is automatically set using CI_REGISTRY_PASSWORD variable.

Note: For security reasons, the docker login asks for the password to be passed through stdin.

script section is used to define docker related commands for building and pushing the image to the GitLab container registry. It is best practice to make use of docker caching in order to build faster on the next run. The latest image is pulled using docker pull command if it exists and then it is built from cache and tags the images. Since this blog is not intended to show how to dockerize a Python-Django application so you must create Docker related files in order to build the image. Lastly, the image is pushed to the GitLab Container Registry.

We do not want this job to run on every commit that is pushed to the GitLab, so we can specify rules stating the condition to only run whenever a new tag is created. Did you notice that we have specified so many variables inside this job? These variables help in keeping the names and tags of the images dynamic. Some of the variables are pre-defined by GitLab. For the complete list click here.

test_app:
stage: test
script:
- apt-get update -qy
- apt-get install -y python-dev python-pip
- pip install -r requirements.txt
- python manage.py test

The test_app job tests our Python-Django application and if the jobs of this stage fail for any reason, the jobs of the deploy stage will eventually fail.

deploy_development:
stage: deploy
script:
- echo "Deploy to development server"
- python3.8 -m pip install awscli
- /bin/bash ./scripts/deploy.sh
environment:
name: development
url: https://$CI_ENVIRONMENT_SLUG.example.com
rules:
- if: '$CI_COMMIT_BRANCH == "dev"'

The deploy_development is the name of the job to be run on the deploy stage. This job is used to deploy our code to the AWS EC2 instance with the help of AWS CodeDeploy service. We need to install AWS CLI in order to run AWS commands. Additionally, we need to set AWS access key, secret key, and region for configuring AWS. Let’s leave this part here. Later in this post, I will show you how you can set the CI/CD variables from GitLab UI. Also, what’s with the deploy.sh script? Since our source code is maintained in GitLab and we want to send the complete source code from here to the AWS S3 bucket, so we need some way to execute AWS CLI commands. We can write all the AWS commands under the script section but it’s way more cleaner to create a bash script to do these kinds of stuff. Don’t worry I will discuss all the contents inside the bash script later in this post, also will show you the best practices to follow while writing bash scripts.

It is very important to have a full history of your deployments for each environment and to keep track of your deployments, so you always know what is currently being deployed on your servers. In order to achieve this, we can make use of GitLab environments by setting its name and URL according to the environment you specify. In the URL you can pass the domain name of your’s or you can set CI_ENVIRONEMENT_SLUG, a simplified version of the environment name, to guarantee that we get a valid URL. Again we can specify rules stating the condition to only run when the commit branch is dev.

GitLab for tracking your environments

Following all the jobs after deploy_development are the same with different environments for staging and production. The only difference in the production environment is that we are using when keyword to turn off auto-deployment to the production environment and switch to manual deployment.

Time to deploy from GitLab to AWS

Remember we have talked about the deploy.sh, file earlier. This is a bash script which is a plain text file containing a series of commands. Normally all the commands that you type in the command line can be put inside this one file and it would do the same stuff, but more elegantly for which you spent too much time by executing each command. Now again, the question is why would you need such a file? It’s simple. GitLab and AWS are two different tools and to connect them we need some way so that we can send our source code to AWS.

Create a scripts/ directory at your project’s root and inside that create a deploy.sh file. Populate the deploy.sh file with the following content:

Does this look mind-boggling to you? Let me break it down:

set -eu
echo "Home=$HOME"

__dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
__file="${__dir}/$(basename "${BASH_SOURCE[0]}")"
__base="$(basename ${__file} .sh)"

Before start writing any bash script, it good to include set at the top with -e and -u options. This will ensure bash script to exit immediately when a command fails and also if there is any unset variable, this will throw an error and exit immediately. Again you can set up some variables which will return the full directory and file name of the script.

generate_revision_number() {
echo $((1 + RANDOM % 100000))
}

build_id="${CI_PIPELINE_IID:-}"
if [[ -z "${build_id}" ]]; then
build_id=$(generate_revision_number)
fi

In this, there are two functions defined. The first one will generate a random number from 1–100000. The second function uses a pre-defined GitLab variable CI_PIPELINE_IID which returns the unique ID of the current pipeline scoped to project. This would be helpful so that every time we upload a new project revision to AWS S3 it will have a unique name with the pipeline number attached to it.

codedeploy_application="${AWS_CODEDEPLOY_APPLICATION:-}"

deployment_environment="${CI_COMMIT_REF_NAME}"
if [[ $deployment_environment == "dev" ]]; then
codedeploy_deployment_group="${AWS_CODEDEPLOY_DEPLOYMENT_GROUP_DEV:-}"
elif [[ $deployment_environment == "staging" ]]; then
codedeploy_deployment_group="${AWS_CODEDEPLOY_DEPLOYMENT_GROUP_STAG:-}"
elif [[ $deployment_environment == v*.*.* ]]; then
codedeploy_deployment_group="${AWS_CODEDEPLOY_DEPLOYMENT_GROUP_PROD:-}"
else
codedeploy_deployment_group="${AWS_CODEDEPLOY_DEPLOYMENT_GROUP_DEV:-}"
fi

revision_bundle_type="zip"
revision_s3_bucket_name="${AWS_CODEDEPLOY_S3_BUCKET:-}"
revision_s3_key_prefix="${codedeploy_application}/${codedeploy_deployment_group}"
revision_s3_key="${revision_s3_key_prefix}/backend-code-apis-${build_id}.${revision_bundle_type}"
revision_s3_location="s3://${revision_s3_bucket_name}/${revision_s3_key}"
deployment_s3_location=""
project_dir="$(readlink -f ${__dir}/../../)"

We have to define certain variables so that we can use it with AWS S3 and AWS CodeDeploy. The variables related to AWS CodeDeploy is application name and group name that you must have already created in your AWS account. Notice how we have set the deployment group name according to the three working environments i.e., dev, staging, and production (production deployment can happen when a new tag is created). Some additional variables have to be set for the AWS S3 bucket which you must have already created in your AWS account. These variables will create a directory structure according to the CodeDeploy application and environments. I strongly recommend following this kind of directory structure pattern for your AWS S3 bucket as it will be really helpful in managing source code for each of the environments.

You must be thinking where we are setting the values for each of the variables. Although you can hardcode the values of your defined variables inside this bash script, this is not a good approach. Remember we have talked about setting CI/CD variables from GitLab UI, we will make use of this instead of hardcoding all the values and you will see this in a minute.

generate_revision_description() {
echo "This is a revision for the python-django application"
}

generate_deployment_description() {
echo "This is a revision for the python-django application"
}

These function blocks are just to generate revision and deployment descriptions.

create_codedeploy_revision() {
echo "Uploading revision to S3 ..."
deployment_s3_location=$(aws deploy push \
--application-name "${codedeploy_application}" \
--description "$(generate_revision_description)" \
--ignore-hidden-files \
--s3-location "${revision_s3_location}" \
--source . | grep -Po '\-\-s3-location \K[^ ]+')
echo "Codedeploy revision for S3 location ${revision_s3_location} is created"
}

create_codedeploy_deployment() {
if [[ "${deployment_s3_location}" == "" ]]; then
echo "Please call 'create_codedeploy_revision' function first"
return 1
fi
deployment_id=$(aws deploy create-deployment \
--application-name "${codedeploy_application}" \
--deployment-group-name "${codedeploy_deployment_group}" \
--description "$(generate_deployment_description)" \
--s3-location "${deployment_s3_location}" |
grep -Po '"deploymentId": \K[^.]+"')
echo "Codedeploy deployment ${deployment_id} is created"
}

In the create_codedeploy_revision() function we have to bundle and upload our application revision to the AWS S3 in a zip archive format. We want to grep the output S3 location into a variable deployment_s3_location by executing this function so that we can pass its value into the create_codedeploy_deployment() function. By calling this function we are creating a deployment that deploys our application revision through the specified deployment group. On successful execution of this command, it will output the deployment ID which can be very useful for monitoring the deployment.

main() {
echo "
Starting the deployment to codedeploy
-----------------------------------------------------------------
Codedeploy Application: ${codedeploy_application}
Codedeploy Deployment Group: ${codedeploy_deployment_group}
Revision S3 Location: ${revision_s3_location}
Source Location: ${project_dir}
------------------------------------------------------------------
"
create_codedeploy_revision
create_codedeploy_deployment
}

main

Did you notice that we have wrapped all our commands in functions every single time? Functions help in making code reusable. Once we create a function, it can be used multiple times in the script. Other than this, functions make code more readable and easier for debugging. Our main() function also serves the same purpose. It invokes all the other functions which are mentioned inside it.

The GitLab CI/CD Variables

Now we have arrived at this part of the post where I will show you how you can set up all your GitLab variables that have been used in above code snippets. These are also known as instance-level CI/CD environment variables and are available to all projects and groups on the instance.

To add these variables for your GitLab project head over to the Settings > CI/CD and expand the Variables section. Enter all the variables and their values that we have used such as AWS access & secret keys, AWS CodeDeploy application and group name, AWS S3 bucket name, etc.

GitLab CI_CD Environment Variables

Now that we have set all the variables and their values, we are all set to test out what we have done so far. Gear up, commit your file and push it to your GitLab instance. As soon as you push a commit you will notice a CI/CD pipeline has started.

GitLab CI/CD Pipeline Job

Similarly, if you have created Docker related files at your project’s root and created a new tag on GitLab with .gitlab-ci.yml in it, you will see the docker_build job running and it pushes the docker image with the assigned tag to the GitLab Container Registry.

GitLab Container Registry Docker Images

Navigate to your AWS console and check the S3 bucket that you have created. It should contain your Python-Django application revision. Head over to the AWS CodeDeploy service and check in the deployment section, a deployment has started for your application to deploy it to the AWS EC2 instance.

AWS CodeDeploy Deployment

WOW! That’s a lot of stuff to digest but you made it to the end. To sum it up, you now have a fully working CI/CD solution that on every single new commit being pushed to your GitLab instance, will trigger a CI/CD pipeline which build, test and deploy your application on to the AWS EC2 instance. Now it’s your turn, apply the stuff what you’ve learned here.

I hope you enjoyed it. Please check out my other blog on Dockerizing Django Application with PM2 and Nginx.

--

--

Avi Khandelwal
DATA PEACE AI

A DevOps enthusiast who loves to automate repetitive tasks, saving some time and energy.