Testing Fastly services on every PR: How USA TODAY tests their CDN prior to changing it

Published in

USA TODAY NETWORK

5 min readApr 9, 2018

When we first switched to Fastly last year, we realized we would quickly run into the same challenge a number of companies with complicated CDN configurations face: How do we thoroughly test that the redirect rules, whitelists, blacklists, and other specific details we have for one of the most important aspects of our news network are working? There is currently no way to unit test Fastly, and while you can certainly run integration tests against your live service and roll back if there is a problem, that is not ideal. Here’s how we thoroughly test our configuration services not only in production and staging, but on every pull request.

The basic idea starts with Terraform. Using the Terraform Fastly provider (which we roll a custom version of, due to the repository being poorly maintained), we can easily create and deploy changes to our production and staging services in an automated fashion. We decided if we could create and manage our services using Terraform, we should be able to create a transient service to run our tests against, and tear it down when we were finished. You might think this requires additional complexities in your VCL code, but in reality, our VCL files are virtually the same for services across ci, staging, and production. Every change we need to create a transient service that we can then run tests against is handled in our Terraform file. Here is the high-level overview of our development cycle for Fastly services:

Make a pull request to the service you want to change. Be sure to include testing!
Jenkins picks up the pull request and starts our CI process.
Our CI process uses Terraform to create a new Terraform environment, named after the JIRA ticket number for the change.
From this environment, it then runs a Terraform apply, which creates a new Fastly service. This service is named after the environment, so it generally looks something like “JIRA-789-www.usatoday.com”.
A suite of tests run against this service. We use the Fastly-generated DNS for this. For example, the Fastly DNS for the above example would look like “JIRA-789-www.usatoday.com.global.prod.fastly.dns.net”.
Once we have all the results we need, we tear down the service using Terraform, and post those results back to Github for review.

So how do we write Terraform code that achieves this? How do we not run into conflicts having the same domains? How do we use a database with special testing data in this model? It all starts with naming the service correctly. We do this based on environment:

name = “${terraform.env == “production” ? var.fastly_name : format(“%s-%s”, terraform.env, var.fastly_name)}”

As far as what backends we use for a service, these are not unique per service, so they can stay exactly the same as production. If you have a special place with staging data in it, used just for testing like we do, you can adjust your backends based on what environment you are in as well. We have three environments for each major service, in addition to our transient CI service that gets spun up. Production, staging (which is maintained as identical to production as possible), and a poorly-named third environment called “origin-staging” that uses special backends that pull fake staging data for testing. We are sure to adjust backends and any associated health checks with respect to origin-staging:

backend {
    address = “${terraform.env == “origin-staging” ? “staging.backend.com” : “production.backend.com” }”
    name = “some.backend.com”
    shield = “iad-va-us”
    healthcheck = “some.backend.com”
    port = 80
    request_condition = “no_default_host”
    auto_loadbalance = false
    between_bytes_timeout = 30000
    first_byte_timeout = 30000
    connect_timeout = 6000
}healthcheck {
    name = “some.backend.com”
    host = “${terraform.env == “origin-staging” ? “staging.backend.com” : “production.backend.com” }”
    path = “/status-check/”
    expected_response = 200
    method = “GET”
    check_interval = 60000
    threshold = 1
    timeout = 5000
    window = 2
    initial = 1
}

So what about the domains? It might feel like domains are tricky, because they are unique per service. But in truth, since we have Fastly-generated DNS, the domains just don’t matter that much. We simply prefix them with environment and test against the full Fastly DNS. Domains get an environment prefix, since they are unique per service:

domain {
    name = “${terraform.env == “production” ? “someotherdomain.com” : format(“%s-%s”, terraform.env, “ someotherdomain.com”)}”
}domain {
    name = “${terraform.env == “production” ? “www.someotherdomain.com" : format(“%s-%s”, terraform.env, “www.someotherdomain.com")}"
}

You can then use the Fastly-generated DNS to access the service. For example, in our JIRA-789 example, the cache for the above domain would be accessed with the URL we can stitch together in our tests:

http://JIRA-789-www.someotherdomain.com.global.prod.fastly.net/

This is great for testing, but preventing bots and humans from accessing these services we spin up, especially the staging and origin-staging ones that stick around, can pose a problem. So we lock down each of these non-production services using an ACL. For our remote workers who use split-tunnel VPN, we password-protect these sites so they can still access them as needed.

There are a couple more gotchas to consider using this method of testing. The first is dictionary creation. We use Fastly dictionaries to store sensitive data we can’t put directly into our VCL; however those are not supported by Terraform. We wrote some custom code to create and fill a dictionary using Fastly API calls. It’s a bit of a pain because it requires activating a version of the service immediately after the dictionary is created, before you can actually add values to it. The second gotcha was the Fastly backend limit we kept running into. While it wasn’t an issue for our smaller services, we have hundreds of backends on one service. It is impossible to correctly create a transient service with hundreds of backends, so we created an additional Fastly environment called “ci”. For this special service, every pull request runs against a non-transient CI service. The downside to this is that multiple PRs cannot run tests at the same time… and of course it is our most-touched service. We’re working with Fastly to come up with an even better approach to test services with these type of special requirements. Who knows, maybe we will even be able to create unit tests in the near future. Until then, we’re pretty happy with the power and flexibility this testing has afforded us.

Testing Fastly services on every PR: How USA TODAY tests their CDN prior to changing it

Written by Bridget Lane