A journey in Continuous Integration Testing
tl;dr: Writing integration tests for headless browsers and integrating them in your Continuous integration flow can be tedious, and won’t be a bulletproof solution.
Before trying something like in this blog post, consider the following options:
- Money is not a problem ? Use a third-part service that’ll run your tests on real devices, browsers versions and resolutions that you want supported.
- No time and don’t trust machines to do a good job ? You could crowd-source user testing on services like Amazon Mechanical Turk.
If your budget is restricted, and you’re not afraid with hacking tools to get stuff working in a reasonable amount of time, keep on reading. We’ll talk about Integration testing using Behave, Behave-django & Splinter, and about Continuous Integration using docker-selenium, docker-compose & TravisCI.
Meet Bran, a valiant software engineer who enjoys a work well done, loves developing new features but not so much maintaining them and even less fixing bugs.
So far, Bran has built an SPA using ReactJS and Redux.
His friends Pony (who provides a nice Python Django CMS and REST API ), Octocat (who helps with project management, versioning and reviewing the code ) and Moby (who sails the infrastructure boat ) invited Bran to grab a drink and celebrate their latest production release.
After a few pints, all the pagers started ringing like crazy: “Production error!”. Bran took a long breath, and started asking his friends:
“We got unit tests, right ?
Yep, Jest for the frontend and pytest for the backend.
We also have linters & type checkers right ?
Indeed, Black, Prettier & Typescript
We even went as far as writing a CSS Regression test, isn’t it ?
Certainly, BackstopJS gave us quite the headache but it was worth it.
Don’t we also have API specifications ?
We have Swagger setup, but our specs need a refresher..
What about third-party services like Firebase, are we testing their integration ?
Silence around the table…
What’s the production error ?
Cannot signup/login with Facebook anymore. The related unit test uses a stub, and we only take screenshots of the signup and login pages without clicking on the buttons.
Alright, Moby please revert the production release right now. Everyone else, go back home and sleep tight. Tomorrow, we’ll fix the bug and re-deploy. Then, let’s write an integration test suite to avoid this kind of issue to re-happen in the future.
Yes, sir!
This situation happens frequently in the world of startups and ventures, where resources and time are rarefied. It is not a good idea to rely too much on tests, but it relieves us from an otherwise time consuming user testing.
A few rules of thumb — for software engineering, not prototyping — are:
- Always write unit tests: blazingly fast tests, testing a single unit itself supposed to have limited operations (avoid God objects like the plague!).
- For branding driven projects, where design is critical, write a CSS regression test: Take screenshots of your pages or part of pages on devices, browsers and versions that you want to support.
- For projects using several micro-services connected via network operations, write integration tests to insure that services really send and receive data as expected by other services.
- For more mature projects with high traffic and large userbases, more specialized test suites can be written like load testing, security testing and A/B Testing (the latter can arguably be used from earlier stages depending on the level of researches and assumptions done to that point).
- User test (Manual QA) everything that isn’t covered via one of these testing suites.
Here, our friends probably weren’t as rigorous as they should have been during the user testing phase, and they hadn’t written a comprehensive integration test suite to automate this process.
But integration tests are slow! Let’s consider the following BDD scenario we use at GOunite — a japanese platform to bridge the gap between students and companies — for example:
Although an integration test will be finished hundreds if not thousands of times faster than a manually done user test, it still has to start a browser, go to the website, wait for all resources to be downloaded and JavaScript code to be executed, and complete all the scenario operations involving several more network and database transactions.
Since Bran and company are already using Docker, Travis and Github, they could find a win-win situation: Have slow integration tests be executed automatically by TravisCI so that developers can keep working as fast as usual only relying on unit tests! We can even connect the integration test results to Github, to approve or refuse a Pull request!
Hopefully by now, you can see the value of writing integration tests to make your product more robust.
Let’s now dive in our test integration using Behave (to write user stories and share them with non-technical staff ), Behave-django, Splinter (to interact with a browser), docker-selenium (to integrate our test suite in our docker-compose development environment ) and TravisCI.
An integration test suite is supposed to test the user stories for your service. If your service is composed of several micro-services, you do not need to tie your test suite to any of these micro-service.
Yet, we chose to write our test suites close to the people using them, ie CSS regression suite written with javascript in the frontend repository, and Integration suite written with python in the backend repository.
I like the Behave Python framework, because it allows us to write features and scenarios using regular English sentences (or any language, really). It can be used before the development of a feature to agree with non-technical people on expected behavior, and re-used afterwards to validate the implementation.
Behave-django empowers us by allowing us to create a test database and custom fixtures using Django’s manage.py tool, and running the integration test suite in the same fashion as our unit test suite.
Splinter is like browser testing on steroids. It’s a nice wrapper around Selenium allowing us to write succint and easy to read tests related to browser operations.
Let’s have another look at the Contact scenario from above, matched with its steps implementations:
Jon the reader: “That’s it ? 25 lines of code to visit a page, fill a form, submit it, verify the redirection page and even check for emails sent ?”
Hey Jon, nice to meet you! Yes, it’s worth it installing a couple more libraries such as behave-django and splinter, right ?
Jon the reader: “Why did you add the @fixture.browser.chrome decorator to the feature?
Splinter and Selenium offer several browser integrations, so you could decide to run several browser if you’d like. Since we already have another CSS regression suite and do not care about design in our integration tests, a single browser is fine for us.
Here’s how we defined our decorator:
Jon the reader: “I just noticed that Splinter has a Chrome webdriver. Why are you using the remote one ? And what is this selenium-hub ?
Here’s the thing: From our tests, we’re connecting to a domain name https://xpc.gounite.test/
which doesn’t exist!
A django test runner would normally send queries to http://localhost:43093
or other random port. The thing is, we have several micro-services, and our React project needs js/css files to be downloaded, then once the SPA started, queries are sent to the API server.
While we use Kubernetes on production, we simplified the developer’s local infrastructure using docker-compose, and we use a nginx reverse proxy docker container to map and rewrite requests to the appropriate docker container.
Let’s look at a couple interesting configuration points.
reverse-proxy:
networks:
default:
aliases:
- api.gounite.test
- xpc.gounite.test
Aliases are a docker-compose feature that map said aliased domain name to the container. Any container from the docker network will be able to query the reverse-proxy
container using api.gounite.test
.
In addition, our requests are not sent from the python container which runs our integration tests, but from a chrome container which contains the browser we’re using to test the user stories.
server {
listen 8080;
resolver 127.0.0.11;
set $proxy_pass http://backend:8001;
}
The resolver
here tells nginx where to query to get a DNS resolution for the domain name api.gounite.test
. Docker DNS resolver always use the same IP in any docker network, so it looks like we can safely hardcode it in the configuration. By querying the Docker DNS resolver, nginx will know that backend
corresponds to the ip of the backend container within the same network, and it will be able to forward requests from the chrome container to the django test runner.
Here’s how it looks like in the end
One final tip regarding debugging within such environment:
Python developers must be familiar with ipdb to set breakpoints in their code and easily investigate any frame. For our docker container to stop the execution on a breakpoint and give us access to a python shell, we need to restart our container with the --service-ports
flag. Since this flag can only be used with docker-compose run
(not exec), and we absolutely want to keep the backend
container name (versus backend_run_1
or other), we need to remove the currently running container first as to avoid a naming conflict.
We also need the feature we’re debugging to be decorated with @wip
for Behave to not capture stdout or logging output and stop at the first failure.
$ docker-compose up -d # Regular dev env
$ docker-compose rm -sf backend && docker-compose run --service-ports --name backend backend
We can also debug the frontend since we have a chrome browser ! Within your python shell, send your browser actions like context.browser.visit()
, and see what’s happening within your chrome instance using a vnc viewer (Note: you need to use image: selenium/node-chrome-debug
instead of image: selenium-node-chrome
in your docker-compose config).
$ vncviewer 127.0.0.1:5900 # password 'secret'
Here’s how it looks like:
When Bran, Pony and Moby suddenly shouted in joy, Octocat and Travis got startled.
Octocat: “I suppose you guys successfully setup our integration test suite ?”
Bran giggled while he demoed it to his friends.
Travis: “Great! Now, we’ve got to setup the CI as to not slow down the devs…
Bran stopped smiling… he knew that wouldn’t be a walk in the park.
Running integration tests is great. But your teammates productivity will hardly increase if you ask them to run very slow tests several times a day.
Of course, you could find a middle ground like only running the tests before a release or so, but it’ll be much harder to debug issues as a result. Seeing errors as early as possible is good practice.
At GOunite, the most important part of our process is the Pull Request. Everything in master should just work, and master can normally be deployed to production anytime (“normally” isn’t always guaranteed, which is why a staging server is always useful).
Therefore, running our integration test suite at the end of each pull requests sound like a good solution. Leaving it to TravisCI is even better, freeing up local resources to focus on the next task.
Now, we do not want to build a fully fledged docker environment for all our micro-services travis builds.
So we used the infra repo to build the whole thing and run both continuous integration and css regression tests.
Then we could trigger the infra travis build from the backend or frontend repositories.
Travis: “Sorry to interrupt, but… we only paid for a single concurrent job on Travis.com”
Oh. right. If we try to trigger a build, then wait for it to finish, we’ll wait forever, because the next build is waiting for us to finish as well…
Octocat: “I might have an idea. How about adding another PR check? When your backend PR build is finished, it triggered the slow tests on an infra build, then let github knows that the build is in progress, and when the latter is completed, it also let github knows whether the build was a success or failure.”
Ok, let’s do it.
Beware the ugly script due to time constraints:
We use trigger-travis.sh to send a request for Travis to trigger a build of the infrastructure repository. But since we needed to give it an environment variable TRIGGER_PATH
for the infra build to know what github repo to send the build result to, we had to hack the query a little bit.
You’ll also notice the condition if [[ $TRAVIS_COMMIT_MESSAGE == *"[slow]"* ]]
. This means that regular commits will not trigger a build on the infra repo. One needs to write a commit message such as
$ git commit -am "feat: PR #1 now completed! [slow]"
One important issue to keep in mind, this configuration uses Github Statuses checks. Statuses are specific to commits, while Github Checks Runs are tied to the pull request.
This means that if we add a commit without [slow]
after our commit, the check visible at the bottom of the PR is removed. Whenever we get a bit of time, we’ll use Github Actions to improve upon this solution.
Bonus: configure the BackstopJS CSS regression test suite with docker-compose and Travis
This is a follow-up to the previous article called “Overview of BackstopJS, a tool to test a web application’s UI”.
In the previous blog post, we ran Backstopjs directly from the host. Half of the team runs on Linux, but the other half runs on Mac OS, and as expected these guys had all screenshots differing because the fonts had very slight variations.
We had specified some yarn shortcuts like css:test
and css:reference
, so let’s modify this setup to support docker:
All we did was replace the backstop
calls with a cd into the docker-compose configuration directory then exec docker-compose.
There are 3 important tricks to know here:
- The
--no-deps
docker-compose parameter won’t start linked services. You already have a docker fleet running and you only want to start the backstop container. - The
shm_size: 512m
config for the backstop container. Docker allocates 64MB to /dev/shm by default, and the chrome instance started by Backstop ends up having bus errors. - Backstop cannot auto-open the html report in your browser from within a docker container. This is why we added a
open
(mac) andxdg-open
(linux) condition to auto-open the file after a test run.
That’s pretty much it. If you read this post from the beginning, you know how to have Travis CI build and run the docker-compose environment.
One nice to have feature is uploading screenshots and reports somewhere (Google Cloud Storage in our case).
Any question ?
Jon the reader: “How many integration tests do you conduct before validating a page ? For example, how do you handle signup and login/password validation, and re-login with the same credentials ?”
Hmm… we do not validate pages, but user stories since we write integration tests in a Behavior Driven Development fashion.
Also, login/password validation should be part of the unit test suite (presumably on both backend unit tests and frontend unit tests).
Please check again the “rules of thumb” I shared earlier.
Unit tests should be simple and fast (such as testing the password validation function).
We use BackstopJS to test our design, which includes text.
Therefore, all we want to verify with our integration test suite is that pages exist and required elements are actionable. It should also be easy enough to be readable by anyone from the service and not only tech members.
Here’s a part of our authentication feature:
This wraps up this blog post on Continuous Integration and Integration Testing. Feel free to post a comment if you have any question!
Many thanks to Vincent Prouillet and R. G. for helping with reviewing this blog post, and Ash Curkpatrick for the illustrations.