How an Army of Robots Keeps Quorum Running Flawlessly

Published in

Qbits

8 min readMay 31, 2018

Picture this: you wrote a new feature for your company’s product — putting your company’s emoji generator on the blockchain. The unit tests pass, the feature works in development, and everything seems to be looking good. It even works perfectly in a staging environment! The feature gets deployed and everything is awesome — until a client calls. Turns out, that super-awesome-new enhancement inadvertently broke a part of the site that nobody thought to check.

What the team just experienced was a regression, where bugs pop up on previously tested and functional systems. Software teams can combat this with something called regression testing to ensure previously developed software performs as intended after changed or interfaced with other software. This takes place during the quality assurance (QA) stage of a deployment.

Our team at Quorum takes regression testing seriously. We run integration tests across our entire system before every deployment and test even more extensively for new feature releases. Over the past six months, we’ve created and fine-tuned a testing infrastructure to cover our site using Selenium web automation tools and the Python unittest library. Building these tests into a larger process has made our deployments remarkably more consistent and safe, and our team-wide QA efforts have resulted in noticeable improvements in site stability and reliability.

Testing Pyramid and Integration Tests

Testing Pyramid and other testing musings from the Google Testing Blog

This is the testing pyramid — a useful model for budgeting time and effort in testing while building software. At the bottom are the familiar unit tests made by each developer while creating a feature. Unit tests are essential for testing the nitty-gritty of the source code: making sure functions work properly, inputs receive the correct outputs, and fixed bugs don’t show up again. This level is the base of system testing because unit tests are fast, reliable, easily debuggable, and maintained by each developer. They are paramount to successful code and quick feedback on possible mistakes.

The tip of the pyramid is home to the end-to-end (E2E) tests. These tests are the most complex but represent the smallest amount of effort that should be placed into testing. E2E’s are valuable to simulate exactly what the user sees, and hopefully finds bugs that wouldn’t otherwise be found. However, these tests can be inconsistent, slow, and hard to debug.

In the middle lie integration tests (aliases include service tests and component tests). These differ from typical unit tests by instead encompassing a whole component or feature. However, they are not as comprehensive nor detailed as E2E’s and are intended to be run more frequently and with quicker feedback. In this semantically-nebulous space is where we have built our QA testing system to run regression tests each time we deploy new software.

At Quorum, our integration tests are a somewhat unique blend of both E2E’s and integration tests. It’s our goal to build coverage across the site over time on all our features, while consciously avoiding an attempt to cover the site with exclusively E2E coverage. Our distribution of tests can be broken down complexity, in increasing order:

Smoke tests — making sure a webpage loads and renders
Feature tests — making sure a feature works as expected (e.g. download button clickable)
Depth tests — making sure a full feature path operates (e.g. creating an email alert)

When creating tests with site coverage in mind, smoke tests are arguably the most important. These extremely simple tests (which typically check if a page on the site loads at all) will often be sufficient to cover most of the site. Smoke tests are easy to create, run quickly and reliably, and identify many fixable bugs. They result in a higher percentage of code coverage with minimal effort, the pinnacle of programming laziness.

The drawback of smoke tests is that they are just that — high-level tests looking for a fire. They simply can’t catch smaller but very problematic bugs in our features. That’s where feature and depth tests come in. These two concepts are similar, but each reaches a different level of complexity and depth in the system. We find feature tests are most effective checking that certain part of a feature is working properly — such as adding columns to our Sheet feature.

Depth tests are extremely helpful for checking common and lengthy paths of engagement of our users. A great example of this is creating and sending emails through our Outbox feature. Since our clients use outbox for their important external communications, we go through intense effort in development and testing to make sure that every Outbox email is sent without a hitch.

Selenium and QA

All of our integration tests are written using the web-automation framework Selenium. Its open-source availability and active developer community made Selenium a great choice for our automation needs over other tools. It is relatively simple to get going: start with downloading a webdriver executable for a browser, add in some dependencies for selenium, whip up a test or scraper or two, and voila! We’ve got an automated browser army at our command. For our integration tests, we use Selenium with Python’s unittest library.

Here’s an example of how we use Selenium to test our system:

This is a great example of a smoke test for our search box — the absence of which would be devastating to our users. The above code highlights the testing “soldier” test_search_box , which points at the search box on our site and checks that the box contains the correct placeholder text. This is a method specially made for testing the search box every time we deploy.

The soldier here is a part of a larger “battalion” known as the test suite GlobalTest , responsible for testing the Quorum landing page. Like a large infantry unit, GlobalTest has ranks of different soldiers for all sorts of other specialized tasks — testing the search box, help center, sidebar navigation, etc. The battalions are all structured on Python’s unittest class fixtures, which provide the formation for all tests in the test suites to be run at once. The GlobalTest test suite, combined with other battalions for our other features, forms our testing army.

For the implementation of our testing model, we adopted a version of Selenium’s page object design pattern. In this pattern, we separate the three primary components of the infrastructure: tests, interactions with the webdriver (known as actions), and selectors for items on the HTML document object model (locators). Think of this as the larger strategy the army must take to test the code.

Tests classes (GlobalTest) contain all tests for a particular feature and inherit from both the GlobalAction class and our custom Python unittest wrapper QuorumUnitTestCase.
All interactions with the page are contained in a GlobalAction class, and are shared across all tests.
Selectors used to find objects on the page are saved as constants in the HomeLocators class.

This design pattern has provided a fantastic model for building our smoke, feature, and depth tests — or the different levels of the infantry soldiers. When creating tests for a part of the site, we start by building smoke tests at a global level. For each additional feature, we create corresponding feature tests and finish with more time-consuming depth tests around a special part of the feature at hand.

At the root of this large tree of tests and action classes is our wrapper around the Selenium webdriver:

SeleniumActions are our own helpful wrapper and additions around the Selenium library. Here we’ve standardized, abstracted, and created our own methods built around the webdriver’s capabilities to use throughout our backend. We create simple wrappers around common methods like getting the current URL or console logs. We replace the existing find_element methods with our own to increase the stability of the testing system, taking advantage of Selenium’s WebDriverWait DOM polling tools. We include other common methods to find parent or child elements in the DOM tree — helpful for finding containers and other elements that aren’t easily directly selected.

This abstracted code and design process is aiding us every day in creating new tests even as our software grows. With the strength of an army of robot testers, it’s even more important to consider how to implement them seamlessly into our code review and deployment process.

Trusting the Process

Testing systems are no help if they’re used improperly, thus at Quorum we’ve worked at length to implement tests into our deployment process. This process is recorded in short-form in a checklist that each developer is charged with filling out during each deployment. We all must “trust the process” of deployment to ensure good quality code reaching production. In line with our checklist, there are three guidelines: 1) deployment starts in the morning and consists of the previous day’s fixes, 2) those fixes should be relatively small in magnitude (or else they are directed towards the larger bi-weekly release), and 3) each new piece of code added to deployment needs to be treated with a new round of regression testing.

Sticking to our process, including extensive testing, is how we’ve made our deployments more accountable and automated. Our employment of a little bot army has saved us incredible time and resources by replacing manual testing and allowing greater transparency and communication in our processes.

Summary

Investing in testing infrastructure and processes is a long-term investment. Clients won’t directly see any new flashy feature, announcement on the masthead, or even use anything related to testing. But the benefits of testing will be seen long-term in contract renewals and higher usage rates as users continue to have great experiences with our software. We have integration tests, Selenium, and our accountable deployment processes to thank.

Our automated testing system sets the table for us to move towards more complex and powerful models of deployment. We’re especially looking forward to improving this system with train deployments and feature flags. While we continue to perfect and iterate our system, we’re providing our clients a better and better experience every day.

Interested in working at Quorum? We’re hiring!

How an Army of Robots Keeps Quorum Running Flawlessly

Testing Pyramid and Integration Tests

Selenium and QA

Trusting the Process

Summary

Written by David Ranshous