Test-Driven Development for Infrastructure

How do I do test-driven development (TDD) for infrastructure? It’s impossible!

My answer is usually:

Great question! It’s not perfect but there are some TDD techniques we can adapt for infrastructure. Here’s how.

I finally decided that my rather lengthy response and explanation should go in one place. Here, I’ll cover:

  • What is test-driven development (TDD)?
  • How it can help with software development?
  • How it can help with infrastructure?
  • What’s an example for infrastructure?

What is Test-Driven Development (TDD)?

Test-driven development, or TDD, is an approach to software development that involves writing the tests for some functionality first and then implementing the functionality to pass the test. The expectation is that the first test will fail at the start and we build functionality to get it to pass. We continue running the tests and fixing our implementation until the test passes.

  • My calculator should return positive absolute value for two positives.
  • My calculator should return positive absolute value for two negatives.
  • My calculator should return positive absolute value for one negative and one positive.
func TestCalculatorShouldReturn5AsSumOfTwoPositives(t *testing.T) {
assert.Equal(CalculateSum(3, 2), 5)
}
func CalculateSum(x int, y int) int {
return 0
}
func CalculateSum(x int, y int) int {
return x + y
}
func TestCalculatorShouldReturn5AsSumOfTwoNegatives(t *testing.T) {
assert.Equal(CalculateSum(-3, -2), 5)
}
import "math"func CalculateSum(x int, y int) int {
return math.Abs(x + y)
}

How can TDD help with software development?

There are many arguments to TDD and not to TDD. When I first started to do TDD, I had a lot of complaints:

  • My time is better spent building functionality over writing a bunch of tests.
  • It feels counterintuitive to write the tests firsts.
  • Testing is hard. Every time I tried writing tests, I always end up re-writing a bunch of my functional code to make it testable.
  1. It makes me think about what functionality I am supposed to implement.
  2. Testing is easier because I make it testable from the beginning.
  3. My code is cleaner because I know what functions I need to minimally address.
  4. I take less time because I don’t have to remind myself what a function was supposed to do.
  5. When I read someone’s code, I can read the tests first to figure out what their functions are supposed to be doing.

How can TDD help with infrastructure?

The core principles of TDD in software development is to:

  • Only develop functionality that is needed.
  • Express the functionality in human expectation rather than code.
  • Organize the smallest unit of code to implement the functionality or logic.
  • Only create / configure the infrastructure resources that are needed.
  • Express configuration declaratively, using tests as a reference.
  • Organize the smallest set of infrastructure resources to meet a security, resiliency, or operational requirement.
Testing Pyramid. The type of tests in the top of the pyramid are more costly to run than the tests at the bottom.

What’s an example for infrastructure?

Let’s TDD a request to create a complicated S3 bucket in AWS. This S3 bucket has the following requirements:

  • There should be a “MyBucketWriteUser” that can write anything into the bucket.
  • There should be a “MyBucketReadUser” that can read anything from the bucket.
  • Anyone with the “MyBucketRole” should have administrative access.
  • Deny everyone else. Bucket should not be publicly accessible.
  • I have no clue if my policies are correct.
  • I have to create and check everything in AWS, running up my bill.
  • My security team doesn’t have a way of telling if my bucket is secured as we expect, other than examining AWS.
  • Golang (Note: I had to write my own structures to unmarshal the AWS JSON. I couldn’t find a suitable AWS Golang library for my purposes.)
  • Terraform
  • AWS
  • Ruby & awspec

Unit Tests

Recall that unit tests validate configuration and syntax. They’re inexpensive tests, so I’ll begin with them. I create a tests/unit directory, complete with a starter test file called policy_test.go.

> policies
# bucket.json, eventually
> tests
> unit
policy_test.go
main.tf
mybucket.tfvars
outputs.tf
variables.tf

Contract Tests

Let’s traverse up the pyramid and write some contract tests. Why? Well, my bucket policy requires a MyBucketWriteUser, MyBucketReadUser, and MyBucketRole. These are not created as part of the bucket policy but are created by AWS IAM requests. The point of contract tests is to check that the output of one service matches the expected input to my service. I’m going to write some quick tests to make sure that the usernames of the principals in my AWS IAM specification match the ones I put in my bucket.

A contract test will help me evaluate if my bucket policy matches the IAM declarations.
  1. Triggers terraform plan for the my IAM users and roles.
  2. Checks the plan for the naming and permissions.
  3. Matches the naming and permissions to those I’ve added to my bucket policy.
$ go test ./tests/integration/...+ aws_iam_role_policy_attachment.bucket_admin_role
id: <computed>
policy_arn: "${aws_iam_policy.bucket_admin_role.arn}"
role: "MyBucketAdminRole"
...

" does not contain "MyBucketRole"

Integration Tests

I tested the output of my IAM users to match the input to my bucket policy. Now, I can create them in AWS. For these integration tests, I deviate from my original Golang tests to swap to awspec in Ruby. Yes, I went polyglot. There isn’t a useful tool in one language so I often swap to take advantage of a more useful testing tool. awspec uses Ruby’s RSpec to check the AWS components that have been created. I’ve found this useful not only for integration and component testing but also for security and compliance checking. Running this test on a regular schedule helps check for manual changes or deviations!

$ terraform apply -var-file=test.tfvars
...
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
$ cd tests/integration && bundle exec rake spec
..................
Finished in 3.77 seconds (files took 3.19 seconds to load)
18 examples, 0 failures

(Manual) End-to-End Test

My final question is, “Does this bucket allow the right users and roles to access it?” We’ll see! I opt to do this manually since I’ve contract and integration tested already.

## As MyBucketWriteUser
$ aws s3 mv test.txt s3://mybucket/test.txt
move: ./test.txt to s3://mybucket/test.txt
$ aws s3 cp s3://mybucket/test.txt ./hello.txt
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
## As MyBucketReadUser
$ aws s3 cp s3://mybucket/test.txt ./hello.txt
download: s3://mybucket/test.txt to ./hello.txt
## As MyBucketReadUser
$ aws s3 mv test.txt s3://mybucket/test.txt
move failed: ./test.txt to s3://mybucket/test.txt An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
## As MyBucketAdminRole
$ aws s3 rm s3://mybucket/test.txt
delete: s3://mybucket/test.txt
$ aws s3 ls s3://mybucket/
# Returns empty

In Summary

Above is just one simple example of how I approach TDD for infrastructure. Admittedly, writing the tests for the example did take a chunk of time. I could have written the bucket policy and checked it manually, achieving the outcome in a few hours. However, TDD compelled me to write some unit tests first and run them locally. I actually only had to run terraform apply once, which was after the integration tests! In some ways, I had saved some money on my AWS bill. I felt more confident deploying to AWS because I had checked:

  • The functionality of my bucket policy via my unit tests.
  • The contracts between my IAM users and my bucket.
  • The integration of my security policies.
  • I could have used terratest instead of awspec for integration tests. terratest is written in Golang and would have been more consistent in language with the rest of my tests, except that it currently doesn’t have a good way of checking for bucket policies. I do think terratest would be good for end-to-end testing.
  • I wouldn’t keep my bucket policy in a JSON document. In this example, I used a JSON file because (1) it had less set-up code for the test and (2) I wanted to show that unit tests could be agnostic of the infrastructure-as-code tool. For extensibility and scale, I usually use an aws_iam_policy_document declaration that comes with Terraform. It’s a little bit trickier to unit test but the ideas are pretty similar.
  • awspec (and other RSpec-like testing tools) can sometimes cross the lines between integration and contract tests. The testing pyramid can be pretty fluid for infrastructure. I use my own discretion to determine if certain resources would benefit from contract tests or just the integration tests.
  • Testing tools may not support every feature that might be applied to a public cloud resource. For example, I couldn’t add an awspec test to ensure the bucket’s public access was fully blocked. That isn’t covered in any of the unit, contract, or integration tests.
  • If I have a lot of components and want to go even further with integration testing, I use localstack. localstack mocks AWS components like buckets, databases, etc. on my local machine. It’s not my go-to integration testing tool because it doesn’t always support the mocks I need. Sometimes, I’ll actually spend more time debugging it than writing the test.
  • Speaking of time, I evaluate my return on investment for writing tests. Sometimes, I might not write end-to-end tests because integration tests sufficiently cover my functionality. Other times, I forgo the contract tests because the amount of time I spend figuring out how to test the contract outweighs the benefit. I constantly work to balance the types of tests I write, the time taken to write them, and the confidence they provide me before I deploy.
  • Divide the minimal configuration I want in a cleaner way.
  • Express the configuration I want declaratively.
  • Focus on testing most of my infrastructure locally, which reduces my feedback cycle and cost.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rosemary Wang

Rosemary Wang

610 Followers

explorer of infrastructure-as-code. enthusiast of cloud. formerly @thoughtworks. curious traveller & foodie.