Acceptance testing Go services using aceptadora

Oleg Zaytsev
Oct 26 · 8 min read

Today I would like to share with you our experience (and some code) on writing acceptance tests at Cabify. Each time we deliver some broken code to production, thousands of drivers lose their ability to work, and since the ride-hailing business usually doesn’t wait for your rollback, the opportunity to serve the lost journeys is gone forever.

Sometimes the unit tests are just not enough, as they rely on a human correctly understanding the contract it’s testing. There’s no value in asserting that your code correctly sends a query to MySQL if that query doesn’t match the database schema deployed.

Despite the risks, we do deploy our code multiple times per day and we often deploy code powering features that we don’t even know how to reproduce manually in a testing environment. Some of our teams don’t do manual testing at all. Instead, we do acceptance testing.

We think that extensive acceptance testing is one of the keys to keep our deployments reliable and agile at the same time. Today, I want to tell you how we do it for Go services.

Image for post
Image for post
Illustration by Maria Letta

Where we came from

We inherited a codebase that was written by a few engineers that were not in the company or were not contributing anymore. The code was originally ported from Node.js so I guess that the gain in robustness of the compiled code justified somehow the lack of tests, as the low amount of coding resources available could now be used to deliver new features instead of debugging JavaScript.

The code wasn’t just poorly tested, it also wasn’t testable, and in order to make it testable we had to refactor it, but in order to refactor it, we had to know what it did, and in order to know what it did, we needed to test it first.

Acceptance tests as documentation

This was our first acceptance test:

Our first acceptance test sent a JSON message over NSQ and expected to receive a message on a different topic.

Around this, we loaded our fixtures into the databases and started the dependencies (NSQ, Redis, memcached, some other services…) once per test suite. We tried to use docker-compose for the orchestration in the first place, but it didn’t ensure that that a dependency was started, just that it was starting, and that caused racy behaviours on starting up services. Then we switched to a set of plain shell scripts that were checking that each container was ready to handle requests before the next one started.

So what we had at this point was a bunch of containers in a Docker network and the acceptance-tester container running Cucumber joining that network and interacting with the services.

We lived with that solution for a few years, we split our monorepo and replicated the same approach across multiple services, however, we were facing some issues:

  • We had to write Ruby to write acceptance tests. We’re Go developers and for some of us this was our first contact with Ruby, so you can imagine the quality of the code defining the Gherkin steps.
  • Our tests ran after the test subject and its dependencies had started, so there was no control on how those behave.
  • This lack of control meant that the dependencies and sometimes the test subjects retained their state between tests (we do have some stateful services), which caused some tests to be flaky and unable to run a filtered subset of tests, as many of them depended on the previous test.
  • Debugging tests meant filling them with puts. We still have a lot of those.
  • The only way to run tests was to run the entire suite from the terminal.
  • When we started to switch to gRPC, it wasn’t as easy as “send or assert this JSON” anymore. We had to carefully craft each mock and each Cucumber step definition for each gRPC method. We didn’t have the time or the desire to carefully craft Ruby code inside of our Go codebase, so we started to write less acceptance tests.

Acceptance tests written in Go

However, we still faced the lack of control, and when we started mocking gRPC services that our test subject intends to call, that caused a major issue: as the acceptance-tester joined the network after the service had already started, the service had already failed in dialling the gRPC connection and some of the requests would automatically fail instead of reaching the mocked gRPC server on the acceptance-tester. We solved this by building a small library that interacted with the Docker daemon and started the service from the test itself: this is how aceptadora was born.

Once we moved the service startup to the test itself, we were also able to move the dependencies to a standard docker-compose up as we were now able to check their healthiness before starting the service.

Aceptadora

  • It should remove the need for docker-compose : we want a single tool that handles everything, not a combination of two tools.
  • It should be easy to understand, that’s why the syntax of aceptadora.yml is based on docker-compose.yml
  • It should run in all the environments seamlessly, so it should adapt to the environment it’s running in.
  • It should allow running individual tests.
  • Debugging tests with standard tools would be great, and that implies not running in a container (you can debug tests in a container, but it requires extra configuration).

And since we managed to achieve all those requirements, we decided to share it with the community as an open-source library: github.com/cabify/aceptadora.

What does aceptadora do?

Defining services with aceptadora

Notice some details about thisaceptadora.yml:

  • We need to refer to the filesystem elements, env config files and binds, relative to the ${YAMLDIR} in order to allow running tests from different folders.
  • Instead of thevolumes attribute we use the binds attribute to create filesystem binds. We’ve reserved the voluemes attribute to allow us to define actual volumes in the future.
  • The images have to be referenced using the full canonical path. That is, referring to the redis image as docker.io/library/redis for example.
  • In the example we expect our service container to be tagged as docker.local/service . This isn’t something standard, but you still need a canonical domain/[path/]image format.

Loading environment-dependant configuration

This is necessary as different environments can have different Docker setups. For instance, in Gitlab with dind, the services will be bound on the docker host instead of the usual localhost.

Of course you could load that config before running the test, but loading it from the test lets us run tests with no extra configuration, which is especially valuable when you’ve just cloned the repo and you’re running the tests from your favorite IDE.

Notice also that we provide t, which is an instance of *testing.T, to our test. In order to keep your tests to the point, we don’t return errors from aceptadora: we just make the tests immediately fail if something goes wrong.

Instantiating aceptadora

Actually, it’s even easier if you set your config in the aceptadora.env we’ve loaded earlier, and use envconfig to load it:

The instantiation will have these two side-effects:

  • It will set the YAMLDIR env var to the one provided through config.
  • It will set the TESTER_ADDRESS to the first non-local IPv4 address, unless it already comes set.

Both of them can be used in the yaml file loaded and in all the env configs loaded by the services later.

The TESTER_ADDRESS is useful if we’re going to mock dependencies of our services in the test itself.

Managing the services’ lifecycles with aceptadora

As mentioned earlier, the health check functions are not included in aceptadora as they really depend on each use case: Redis is ready to serve when it accepts tcp connections, while MySQL is not, and it has to respond to a ping request in order to be considered healthy.

Extra considerations

Although aceptadora will fail if it can’t bind a port for a dependency, e.g. a Redis, disasters and misconfigurations can happen: imagine that you forgot to bind a port for your Redis, your test starts, and decides to run a FLUSHDB. In the best case there’s nothing running on your local 6379 port and it will just fail. In the worst case, there’s a tunnel to a production instance on that port, which for some reason didn’t ask for authentication. Consider checking in some way that you’re interacting with a tested instance (a CONFIG GET for Redis, a CALL in MySQL).

Conclusions

We extracted it as a library to make acceptance testing uniform across multiple services, and we expect it will help people out to improve the reliability of their deployments.

Cabify Product

Our mission is to make our cities a better place to live, through technology and design

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store