Testing in Ephemeral Environments with Kubernetes and Terraform

Cory Lucas
4 min readOct 11, 2019

--

Kubernetes + HashiCorp Terraform

Most feature development at Kabbage has happened in short lived branches for the last several years. We use feature branches as a means for keeping the master branch in an always (🤞) releasable state. To help ensure that our master branch stays sane, we like to test changes before we merge the branch.

When we first adopted this pattern, Kabbage had around 12 developers and 3 testers. We decided to make 4 new static environments that were scaled down clones of our production environments. This worked for a while. A developer would write their code and run tests locally, we’d deploy to an environment and run tests remotely, and then merge the feature branch, freeing the environment for someone else’s use. Then, obviously, as we added more developers and testers, we needed to add more test environments - eventually expanding to 16. We noticed that while we sometimes needed more than the 16 environments, we were usually using around only half of them. We also found that the smaller environments were under powered for running our automated test suites against and they generally had a lot of configuration drift due to the amount of changes being deployed to them.

So, we decided a few years ago that we should invest in dynamically provisioning test environments and then destroying them when we were done testing. In 2018, Kabbage took its first steps into the world of Kubernetes, which helped to make this a reality 🎉.

In Kubernetes, we use different namespaces to separate environments, so when we need a new environment we just create a new namespace. All of our deployments for applications living in Kubernetes are configured using HashiCorp Terraform. This made spinning up an application in a new namespace as easy as doing a normal deployment. This gave us the building blocks for our ephemeral environments. We were then left with figuring out how to best use these environments to test our changes.

At this point, we had somewhat philosophical questions we needed to answer. What did we actually want to test in these environments? If we wanted to do full end-to-end testing, we would not only need to spin up the application we were changing, but also all of the services it depended on, and any services they depended on, and so on. While we could theoretically spin up a full Kabbage environment for each of these, we wanted to be able to spin up and tear down these environments relatively quickly and the more we added to each environment the longer it would take. So we opted for testing just the application that changed and the infrastructure it owned, mocking any dependencies. This gives us a relatively quick turn around on creating the environment and running the tests, which lets us run them for each branch as a gate before the branch can be merged.

To give a concrete example of how we run these tests, I’ll describe how we have one of our .NET Core microservices set up. This service provides a RESTful API on top of a PostgreSQL database, publishes events to several SNS topics, and consumes data from another RESTful service running in the same Kubernetes namespace. Because the service is considered the owner for both the database and the SNS topic, its repository contains the Terraform code needed to create them and it is used when we spin up the environment so those dependencies are taken care of. To handle the API dependency, we use WireMock running in a separate pod with Kubernetes services in front of it. To be able to validate that the expected messages are published to SNS, we create an SQS queue and subscribe to the topic. All of this is accomplished using Terraform. We have a separate Terraform configuration that creates the test infrastructure and includes our standard Terraform deployment code as a module.

Now that we have a fresh copy of the API code running in a fresh environment, we need to run some tests against it. For this, we run tests from a separate pod using xUnit. These tests will invoke APIs, validate the responses, and validate the expected messages in SQS. We treat the database as a black box in these tests and do not validate anything in it directly, only via the API. We output the logs from the test container into our automated build pipeline, providing a similar experience as to when we run our unit tests, allowing the build controller to check if the tests passed or failed. At this point, we are done with the environment and use Terraform to tear it all down.

This setup has given us a a lot more flexibility over our static environments and has helped us to be able to quickly iterate on changes. Beyond this isolated testing, we also run more high-level end-to-end tests after changes are merged but before the changes are shipped to production. For some applications, we also want to be able to do more in-depth exploratory testing before a feature is merged. For these applications we have a similar setup to spin up a new partial environment pointed at a shared testing environment running the master version of all other applications. This allows a tester to do some manual testing and then run an automated job to tear down the environment when they are done.

--

--