Optimizing the runtime of CircleCI builds on a Django application

Kunal Mahajan
SquadStack Engineering

--

Introduction

If you are pretty knackered by the amount of time your automated builds take to finish, then this blog is for you. We at Squadstack have been using Continuous Integration for a long time now but it is only recently that we started using our build pipeline more efficiently. We have added some simple steps to our pipeline which has now helped us in reducing the total time to just a fraction of the time which our pipeline used to take earlier.

We majorly work on python and Django and we have used some hacks related to Django and some shell scripting which helped us in improving the runtime of our test cases. Before going straight towards the optimization let’s start from scratch and understand what is Continuous Integration and how we have integrated it into our system.

What is Continuous Integration?

Continuous Integration(CI) is a development practice that most companies follow. It is guided by some of the key principles like revision control, build automation and automation testing. We at SquadStack follow this practice quite religiously. As an early-age startup, we are growing day by day, so it is important that all our code changes are error-less and don’t affect our existing codebase. Continuous Integration helps us integrating code changes into a shared code repository. This practice encourages committing small changes more often than committing large changes infrequently. Each commit triggers a build during which tests are run that helps us to identify if anything was broken by the changes we committed.

What benefits did we get after integrating Continuous Integration in our system?

  • We were able to detect errors more quickly. Whenever we commit some changes in our shared repository, that commit is followed by some automation scripts which we wrote to ensure if everything is working as expected or not. If that automation script fails then the developer gets to know that something is broken and then the developer can take appropriate actions to fix the error.
  • The result of every automation script is visible to every developer in our team. Everyone can see if the script is working fine or throwing some errors. This ensures accountability within the team when there is an issue in the respective commit that should be fixed before deploying to production. The result of every commit is visible to us on slack.
  • It generated a fast feedback loop. Things that as a software developer can hurt us most are the lack of feedback on the quality and impact of the changes we made. As a software developer, it is easy to write code that we think will work as expected but more often than not it is quite the opposite. We as software developers want to quickly commit code and move on to another task without running any sanity checks. But this can hit us quite hard after sometime when we try to figure out which change and from when (and by whom) broke something. Continuous integration removes such situations to happen because after each commit we will get to know if that commit is working as expected or not.
  • Writing robust unit/integration test cases made our CI pipeline more robust. Writing effective/optimized test cases is the core principle to achieve a robust CI pipeline. Good quality test cases include covering edge cases, covering all business logics, number of queries being executed in every block of code, mocking third-party APIs response, etc., etc. In SquadStack, we always try to write effective test cases and it is quite a feat that till now we never had to hire a software engineer in testing.

How did we achieve Continuous Integration in SquadStack?

Here are some basics which we used to get started with CI

  1. Implemented version control(Git, Bitbucket, SVN, etc). In our case it is Git.
  2. Writing quality test cases for critical components of our product.
  3. Got a suitable continuous integration and delivery service that will enable us to run those precious tests on every push to the repository. Our continuous integration service is CircleCI.

How CircleCI can be implemented to achieve Continuous Integration?

  • There is a YAML file that CircleCI reads to prepare an environment. In that environment, we can install some OS-level requirements, project-level requirements, database, Redis, etc. We are using docker to prepare this environment.
  • After that, we write steps to run our custom migrations and test cases. Below is a snippet of the config.yml file which we have made to set up our CircleCI environment.
# Use Circle CI Version 2.0
version: 2
jobs:
build:
working_directory: ~/circleci-django
docker:
— image: circleci/python:2.7.15
environment:
PIPENV_VENV_IN_PROJECT: true
DATABASE_URL: postgresql://postgres@localhost/databse
- image: circleci/postgres:11.5
environment:
POSTGRES_USER: postgres
POSTGRES_DB: database
- image: redissteps:
— checkout
- restore_cache:
key: cache-key
- run:
name: Install Dependencies
command: |
sudo pip uninstall pipenv -y
sudo pip install pipenv==<version_number>
pipenv install
name: Wait for Redis, and Postgres
# Here we download dockerize and wait for each service to respond.
# 5432 is the port for DB (Postgres)
# 6379 is the port for redis
command: |
sudo wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && sudo tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz && rm dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
sudo dockerize -wait tcp://localhost:5432 -timeout 10m sudo dockerize -wait tcp://localhost:6379 -timeout 10m- run:
name: Run Migrations
command: pipenv run python manage.py migrate
- run:
name: Run Tests
command: pipenv run python manage.py test
  • On every commit, CircleCI reads this file, follows all the steps here, and runs our migrations and test cases.
  • In the CircleCI dashboard, we have the option to tell CircleCI which file should it read after each commit.
  • CircleCI also gives us an option to link it with our GitHub repo so that it can track all the commits occurring in that repository.
  • After running the test cases the status of that build is sent to us on a respective channel on slack.
  • CircleCI also sends the coverage report which would tell how much code is covered, where coverage is lacking, and how to improve it.

How did we optimize build run time using CircleCI?

As discussed above, we highly focus on writing good quality test cases since we don’t have any dedicated testing team. As of now we have > 4200 tests in one of our repositories and running these test cases along with migrations, os dependencies, pip dependencies would easily take around 45 minutes or more. But after adding some logic we brought down that time to 12–15 minutes i.e. reduced the time to approximately 1/3rd time. Here are the few simple things which we did to optimize the time taken:-

  1. We started running test cases in parallel. Running test cases in parallel allows running test cases in different threads. Each thread will have a different test database. So, suppose there are 4 parallel threads, there would be 4 different databases made on different threads.
  2. One major issue which we were facing when running test cases in parallel was the cache backend would still be shared across all parallel threads. To fix that we used LocMemcache. LocMemCache is an in-memory cache (not persistent) in Django and so multiple caches are made for multiple parallel test processes. But why do we need multiple caches for multiple test processes? Why can’t we just use one cache across every parallel thread? The answer to this problem is quite simple. Since we are creating different test databases for every thread, there can be a possibility two completely different things have the same id in different databases and when we try to cache that id then that could break things. So, it is necessary to have different cache across different threads.
  3. There was a custom TransactionTestCaseWithoutFlush that was added since the flush operation takes a lot of time.
  4. CircleCI gives us an option if we want to store something in the cache for every build. So, for every build, we cached all the OS/pip dependencies. The cache would invalidate if we add a new dependency and then the cache would be updated again. This saved us a lot of time as well. Caching Dependencies in CircleCI
  5. We started running migrations only when needed. If in a specific commit, any dev has created new migrations then only migrations should run on CircleCI. For achieving this firstly we calculated the md5 hash of all the migration files separately and stored them inside per_migration_hash.txt. Secondly, we sorted the per_migration_hash.txt file so that the order remains consistent because the first step isn’t ordered. Thirdly, we calculated the md5 hash sum by combining the hash value of all the files which we got from the first step. And finally, stored the combined hash in the cache. Now, whenever there will be a new migration file, the md5 hash sum stored in the cache would be different due to a new migration file which will also have some hash value, and combining the hash value of that file with the rest of the files will result in a different combined md5 hash sum. This step avoided unnecessary running of migration in each build. Below is some of the code which we wrote to achieve this step.
- run:
name: Get Migrations Hash
#First Step:- Calculating the md5 hash of all the migration files separately and storing inside per_migration_hash.txt.
#Second Step:- Sorting per_migration_hash.txt file so that the order remains consistent because the above step isn’t an ordered step.
#Third Step:- Calculating the md5 hash sum by combining the hash value of all the files which we got from the first step
#Final Step:- Printing the final hash
command: |
find apps/ -path “*migrations*” -name “*.py” -exec md5sum {} \; > per_migration_hash.txt
sort per_migration_hash.txt > sorted_per_migration_hash.txt
md5sum sorted_per_migration_hash.txt | cut -d’ ‘ -f1 > combined_migrations_hash.txt
echo “Hash generated for migration files is $(<combined_migrations_hash.txt)”
- restore_cache:
# Checking if the hash sum which we got in the above step is present in the cache or not. Caching key format is branch_name-checksum of combined_migrations_hash.txt .
# Note: These migrations caching will be branch dependent.
name: Restore Migrations Cache
key: cache-{{ .Branch }}-{{ checksum “combined_migrations_hash.txt” }}
- run:
name: Run Migrations
# Inside save_cache step every time when the job is running we are storing the hash which we got from the
# Get Migrations Hash step inside ~/cached_hashed_migration/hash.txt in cache
# If hash.txt is present then migrations will not run printing “No changes in migrations detected”
# Else if hash.txt is not present, it will mean that there was a cache miss and there are some sort of changes in migrations file.
# So, now the migrations will run if the branch is one of develop or master
command: |
FILE=~/cached_hashed_migration/hash.txt
if test -f “$FILE”; then
echo “No changes in migrations detected”
else
echo “Change in migrations detected. Run Migrations”
if [ $RUN_MIGRATION_ON_ALL_BRANCHES == true ] || [ “${CIRCLE_BRANCH}” == “master” ] || [ “${CIRCLE_BRANCH}” == “develop” ]; then
pipenv run python manage.py migrate — settings=test_settings
echo “✅ ran migrate — thanks for waiting”
else
echo “Migrations to run only for develop or master branch 😎”
fi
fi
mkdir -p ~/cached_hashed_migration/
touch ~/cached_hashed_migration/hash.txt
- save_cache:
name: Save Migrations Cache
# Storing combined_migrations_hash.txt which we got in the Get Migrations Hash step inside cache every time when the job runs
key: cache-{{ .Branch }}-{{ checksum “combined_migrations_hash.txt” }}
paths:
— ~/cached_hashed_migration/hash.txt

What are the things which we are trying to improve in our current test case/CircleCI setup?

  • In general, we are using Redis as our cache backend but we use LocMemcache as our cache backend to run test cases in parallel as discussed above. Because we use LocMemcache in our test cases we are not able to run Redis-specific commands in our test cases. We have to mock those Redis commands to make our test cases work. This is one of the drawbacks as of now. We are still finding a way to use Redis instead of LocMemcache.
  • Continuous Delivery is still something which we need to set up. Our current deployment process is still manual. Using Continuous Delivery we can automate this process.

Conclusion

We as a company always believe in some of our core values. One such value is Do The Right Thing and to make the right product, we have to use the right tools which help us to be on track and support our development process in a meaningful way. We always follow some principles to run a successful CI pipeline:-

  1. Write good quality tests.
  2. Fix or delete the tests we don’t longer need.
  3. Keep builds green :-)

Also, if anyone is interested then we are hiring as well ;-). Come join us and together let’s put a dent in the world.

--

--