Testing Zelig

This post refers to Zelig version 5.0–13

One of Zelig’s main selling points is that it provides a common syntax for estimating models from a wide range of R packages. This is also one of its key software maintenance challenges.

To provide a common estimation syntax Zelig wraps many other user-created estimation packages such as survey, MCMCpack, VGAM, among others. Depending on these packages opens Zelig up to breaking whenever they are updated. For example, a small change in accepted argument values for an estimation function from a given package could break Zelig users’ code for those models.

In the past, with an exception discussed below, there hasn’t been a systematic way for maintainers to known about such breaks, unless a user reports them. Effectively reporting code breaks has often been difficult for users because the error messages that are returned to them are often oblique without an intimate understanding of Zelig’s and the wrapped package’s internals.

Perhaps the best solution to these problems is to build an extensive unit and integration test suite that is run at frequent intervals using the most recent CRAN releases of Zelig’s dependencies. These will allow broken code to be identified and fixed early by Zelig’s maintainers (for a general overview see Hadley Testing R Packages). More informative error messages will also help users understand Zelig’s limitations as well as debug and report errors.

Here are the tools and strategies I’m currently working with to achieve these goals. As always, feedback and suggestions are highly appreciated.

Starting Point

Zelig already contains a number of unit tests for Zelig’s stochastic models. These have largely been in place since late 2015. Zelig allows us to present and explore substantively meaningful quantities of interest via parameter estimate simulations. This presents a testing challenge. Unit tests compare a predefined expectation — a value, an error message, and so on — to the output of a test call. Many of Zelig’s outputs are stochastic, not deterministic. So it’s difficult to predetermine what result we expect from them even given the same inputs. Monte Carlo unit tests overcome this problem by using an algorithm that will be published in future work. These tests are implemented with the mcunit.test method. Currently there are Monte Carlo unit tests for logistic, log-normal, least squares, negative binomial, poisson, and probit models.

Increasing Test Coverage

As of version 5.0–13 these tests cover about 14% of the Zelig package source code. This calculation was made using the tools discussed below.

Ideally, tests will cover closer to 100% of Zelig’s code. A caveat before proceeding: this is a substantial, but also potentially somewhat superficial goal to achieve. Just because a unit of code is called by a test does not mean that that test compares the output to a substantively meaningful expectation that would give an indication of problems that real-world Zelig users may encounter. Also traditional code coverage statistics do not include coverage of Zelig dependencies that could be invoked.

So 100% code coverage is just one measure of how well Zelig is tested. In addition, tests should be written to capture the full range of user behaviour (both intended and unintended).

Unit and Integration Tests

Unit tests, as the name implies, should be designed to test a discrete code unit. This makes results from the tests easier to interpret and debug if expectations are not met as we can begin looking for problems in specific code units. Unit tests are also less redundant and so more computationally efficient than broad overlapping tests.

The Zelig workflow involves a series of methods — new, zelig, setx, sim, graph — that need to work together. So, in addition to unit testing, we should test the whole workflow — integration testing. These add to the tests’ computation time as they will test functionality conducted by unit tests. However they are crucial for a package like Zelig where each piece needs to work together in order for users to successfully use it into their research.

Testing Tools

I’m currently implementing a set of testing and continuous integration tools that have become fairly widely used in the R community and beyond. These include:

  • The devtool/testthat R packages for package building and development-time testing.
  • Custom assertive functions (stored in R/assertions.R) for runtime testing.
  • Travis CI for continuous integration testing on Linux/macOS.
  • AppVeyor for continuous integration testing on Windows.
  • covr/codecov for reporting and exploring the coverage of development-time tests.

Here is a summary of the development-time test coverage in Zelig 5.0–13:

Zelig code coverage in Zelig 5.0–13(from Codecov). Green areas represent code covered by a unit test.

This plot shows the amount of code that is tested as of the most recent GitHub master Zelig build at the end of 2016. Just under 15% of the code is invoked by one of the stochastic unit tests. Ideally, Zelig will have 70–100% code coverage.

Informative Error Messages & Runtime Testing

In a separate post on documentation, I’m detailing plans for building out Zelig’s documentation. One component of this that is intimately tied with unit tests is creating informative warning and error messages.

Frequently, the focus of unit testing is making sure that software requirements work [REQUIRE TEST]. Zelig’s Monte Carlo tests are requirement tests. A clear requirement of a Zelig model class is that it makes parameter estimates and simulated quantities of interest in an expected way. I.e. given a known data generating process, are the statistical model’s outputs within the expected distribution?

Unit tests should also test the software’s limitations. If software is going to fail, because for example a user has omitted a required input, it should fail early and informatively so that the user can easily learn what went wrong and how to fix it. This requires a different type of unit test: testing for failure [FAIL TEST]. Failure testing follows two principals:

  • If the software is going to fail, fail as soon as possible.
  • When it fails, return a useful message that is understandable to moderately skilled and even novice users.

A major cause Zelig failure is that users pass data to Zelig functions that Zelig and the packages it wraps do not support. An important component of this is the use of runtime assertive functions. Runtime testing involves checking the state of data to make sure that it conforms to supported formats. It is easy enough in R to build such tests using base R code. For example, we can test if an object called x is scalar and return an informative error message if not with: if(length(x) != 1) stop(‘x must be scalar’). The assert that package just makes this code somewhat more compact and returns consistent informative error messages, i.e. package maintainers have to spend less time remembering if there is a similar, but slightly different error message for the same problem elsewhere in the package.

There are a number of assertive runtime testing packages — e.g. assertthat, assertive. Given Zelig’s use of reference classes, which are not well supported by these packages, a desire to minimise dependencies, and a need to have highly context specific messages most of Zelig’s assertions will be built in house. They will follow assertive’s is_* syntax.

Development, Testing, and Release Workflow

The (modified) continuous integration workflow that I’m implementing these tools within goes like this (note: subject to change):

Proposed Zelig Continuous Integration Workflow

Everything starts from the master branch stored on GitHub. The first step to creating a new feature or bug fix is to checkout a new branch for that feature/hotfix from master. Before diving into building the feature/fix, the next step is to create require and fail tests using testthat for the expected behaviour. After this, the feature/fix is coded and tested locally using the test function from the devtools package. This builds the package and runs all tests including executable documentation examples. If the tests pass, then the feature/hotfix branch is merged back into master. If not, then more work is needed to create code that passes the test.

The master branch is built and tested by the cloud continuous integration services Travis CI and Appveyor. This occurs whenever commits are made to master and, in the case of Travis CI every week. The weekly build tests if Zelig is still compatible with the newest releases of its dependencies. Currently the weekly build is only conducted on Travis CI because this is a free (beta) feature for public GitHub repositories like Zelig.

While I work on increasing Zelig’s code coverage, I’m also following this workflow, simply considering each test a new feature, thus collapsing the Create Test and Create Feature steps into one.

Test File Organisation

Presently I’m in the process of creating tests primarily organised around each R source code file in the R sub-directory. E.g. for each source file in the directory, I create a new git branch and then a new tests/testthat/test-*.R file with tests of that source file. Once the tests are built and pass, the branch is merged into master. While I’m going through each of these files one-by-one, I’m also using formatR to ensure that the Zelig source code follows a consistent format and so will hopefully be (marginally) easier to debug.