CI with GitHub Actions for Research Code

Mark Neumann
Sep 26, 2019 · 7 min read

GitHub has recently released GitHub Actions — a great feature which makes it really easy to have free Continuous Integration (up to 10,000 minutes of machine time) for your research project.

This blog post is designed for new researchers and PhD students who are learning engineering principles as they go.

Don’t worry if you don’t know what any of these words mean — CI basically isn’t very interesting, and you probably want to learn as little as possible about it. However, it will save your skin one day, so when it’s this easy to add to your project, it’s a bit of a no-brainer.

Note: The instructions below are assuming that your research codebase is quite small, and written in Python. It also assumes you know a bit about programming in Python and that you keep your code in GitHub.

What is Continuous Integration?

Continuous integration is just a fancy phrase for “testing as you go”. For research projects that involve code, this means running tests for your code every time you make a change that you commit in git (or what happens when you run git push if you are new to git). It is also frequently what the green tick means when you look at a large or popular GitHub repo — it’s reporting that the latest commit to the codebase has passed all the checks that the maintainers want it to.

Step 1 — Sign up for GitHub Actions

Follow this link: https://github.com/features/actions and click the “Sign up for Beta” button.

Sign up for the Beta.

Step 2 — Write a Test

In order for GitHub Actions to be useful for your research code, first, we need to actually have a test to run. If you have one already, you can skip this step.

First things first, let’s add a tests directory to your project, with a Python __init__.py file.

# Your project probably looks something like this
# at the moment.
research_project/
- __init__.py
- data.py
- model.py
- etc, etc
main.py
requirements.txt
# We will add this part!tests/
- __init__.py # This is important!
- test_project.py

Next, inside test_project.py , we’ll write a tiny test, which doesn’t do anything. If you had something you actually wanted to test in your project (even if it was very simple, like loading a model), now might be a good time. Otherwise, you can copy the code below:

import unittestclass TestMyProject(unittest.TestCase):def test_my_code_works_properly(self):
# Eventually, you would test that your code
# actually does, you know, work properly

assert True

In order to run the tests, we will use a library called pytest . One thing to note is that pytest looks for classes and methods which begin with the word “test” — so you need to make sure that all of your test cases are called TestMyCoolNewModel and test_my_janky_data_hack_fixes_that_thing etc.

If you now pip install pytest , you should find that running pytest in the root of your repository (where the tests directory is) runs a test for you.

The next step is to make this test run on GitHub, and to do this, we’ll use Docker.

Step 3 — Make a Dockerfile

In order to make your research code easier to use we are going to use Docker.

Docker is similar to a requirements.txt file, in the sense that it describes things that your code needs to run, but on a much larger scale. For instance, you can specify what type of operating system you want your code to run on, or environment variables that should always be set for your code to work. Docker uses a file called a Dockerfile which contains a series of steps to package up your code into something that contains everything it needs to run, called an “image”.

Docker is a massive piece of technology — we’re going to use only a tiny part of it.

This is a complete, functioning Dockerfile. Copy it into a file called Dockerfile in the base of your GitHub repository.

FROM python:3.7.2# Setup a spot for the code
WORKDIR /project_name
# Install Python dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# make sure that pytest is installed
# we'll need it to run the tests!
RUN pip install pytest
# Copy over the source code (!modify this section!)
# If you have other code here you need to copy it too
COPY research_project research_project/
COPY tests tests/
COPY main.py main.py
CMD ["/bin/bash"]

What does this Dockerfile file say? Let’s break it down:

Firstly, we are starting from a completely different Dockerfile. This is a Python one, which has python pre-installed inside of it.

FROM python:3.7.2

Next, we tell Docker that when we build the image, we want to have a directory called project_name in the root of the image. This means that when we copy something into the image, it will end up at /project_name/<thing> by default.

# Setup a spot for the code
# You could change this to anything you want.
WORKDIR /project_name

Next, we copy in our requirements.txt. What this command means is “copy requirements.txt in my directory inside the Docker image to a file called requirements.txt”. Once we’ve copied it into the image, we can actually install things from it, which is what the next line does — actually executing the pip install command.

# Install Python dependencies
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# make sure that pytest is installed
# we'll need it to run the tests!
RUN pip install pytest

Next, we copy in all of our code in the same way that we did with the requirements.txt file. If you have other scripts or directories which you need to run your code/tests, you should copy them in here in the same way.

Important: Any time you COPY something into a Dockerfile, it needs to exist in your repository! So if you don’t have a main.py , you’ll need to delete that line.

# Copy over the source code
# If you have other code here you need to copy it too!
COPY research_project research_project/
COPY tests tests/
COPY main.py main.py

Finally, we tell Docker that by default, when we run the Docker image, we just want to do nothing. This will do if you are only using the Dockerfile for CI in GitHub.

CMD ["/bin/bash"]

Normally you’d add this in your repository in the same place that you’d have your requirements.txt file. Additionally, if you want to be able to build and run this Dockerfile locally, you can follow the instructions here. However, this isn’t necessary if you only want to be able to run CI on GitHub.

The final step is to commit this Dockerfile and push it to your GitHub repo.

Step 4 — Add a GitHub Action

Go to your GitHub repository where you want to create the GitHub Action and click on the Actions tab.

Click on “Actions”

You should see a page which looks like this:

Click on “Set up a workflow yourself”

Click on the “set up a workflow yourself” in the top right corner.

Copy and paste the below yaml file into the editor.

name: CI
on:
pull_request:
branches:
- master
push:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
- name: Build and test with Docker
run: |
docker build --tag research_project .
docker run --rm research_project pytest tests

What does this yaml file say? Let’s break it down:

Here, we are saying we want the CI to run anytime someone makes a pull request to the master branch, or if anyone pushes directly to the master branch.

name: CI
on:
pull_request:
branches:
- master
push:
branches:
- master

Next, we specify what operating system it should run on.

jobs:  
build:
runs-on: ubuntu-latest

Now, we tell GitHub what our job is called (“Build and test with Docker”) and that we need to checkout our code before we do anything else.

steps:    
- uses: actions/checkout@v1
- name: Build and test with Docker

Finally, we tell GitHub that our job has two steps: 1) building our Docker image, and then 2) running our tests inside it.

run: |
docker build --tag research_project .
docker run --rm research_project pytest tests

Now we are done! Click the “start commit” button in the top right-hand corner of the page. Now, every time you push code to the master branch, you should see GitHub run the check we added. You can see what it’s doing by clicking the green tick, red cross or orange circle that appeared next to the name of your GitHub repo. Here, you can look at the logs of what happened — and fix anything that went wrong before merging a pull request.

You should see something like this when you click on the green tick next to your repository name.

A working example of this process can be found here: https://github.com/DeNeutoy/test-actions/

Hopefully, this blog post has made it easier run tests on your research repo. If you’re interested in reading more about research tips that might make you more productive, you might like to check out our slides from our EMNLP 2018 tutorial, Writing Code for NLP Research. Enjoy!


To stay up to date with new research at AI2, subscribe to the AI2 Newsletter, and be sure to follow us on Twitter at @allen_ai.

AI2 Blog

The Allen Institute for AI | Building AI for the common good.

Mark Neumann

Written by

Research Engineer — AllenNLP http://allennlp.org/

AI2 Blog

AI2 Blog

The Allen Institute for AI | Building AI for the common good.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade