Both tools do pull request automation for Terraform. Both are open-source. Both can be viewed as alternatives to Hashicorp’s Terraform Cloud commercial offering.
As the founder of Digger, I am biased. But the goal of this article is not to make the case of “Digger is better than Atlantis”. In the open-source land there is rarely one right way. Each tool that solves a particular problem comes with its own set of tradeoffs. That’s the reason we have multiple no-sql databases, frontend frameworks and so on.
Of course we wouldn’t have built Digger the way we did if we didn’t believe our way to be better in some aspects; which doesn’t make it better in all aspects. I am going to cover the technical decisions we made and reasoning behind them, in hopes that the reader makes their own conclusions.
A primer
Feel free to skip this part if you are one of the experts comfortable juggling terms like “TACOs” or “merge-apply dilemma”. This part is aimed to help everyone else get up to speed.
What makes Terraform code different from application code is the concept of state. It’s a file that describes the actual configuration of cloud resources. It can be stored locally but more often than not it is stored centrally via backend. State is not unique to Terraform; other modern IaC tools like Pulumi also rely on state. This is not by accident. The first infrastructure-as-code tool, CloudFormation, did not have state; newer tools introduced state to address its shortcomings.
State makes scenarios like CI/CD much harder for Terraform than it is for application code. Yi Lu covers this in great detail in his Pains in Terraform Collaboration article. Because of state, each change can potentially affect every other change. This can lead to race conditions and even unwanted resource deletions. Also, every pull request is not just about code anymore — same piece of terraform code can be applied across multiple environments. To confidently merge a Terraform PR, you’d want to know what exactly changes in each environment.
You’d still want CI/CD for your Terraform code though. Enter TACOs — specialised platforms that run your Terraform code on the server. Just like general-purpose CI/CD platforms, but aware of state and thus avoiding most of the challenges. The most well-known such platform is Terraform Cloud by Hashicorp; some great alternatives are Spacelift, Env0, Scalr (commercial) and Atlantis (open-source).
The Atlantis project was started in 2015, just a year after the initial release of Terraform and full 4 years before Hashicorp announced Terraform Cloud in 2019. So it is the most mature and battle-tested TACO of all. It is likely that the design choices made by the creators of Atlantis influenced all others. The lead maintainer of Atlantis joined Hashicorp in 2018.
Meanwhile in the CI/CD land, a lot has changed since 2015. The platforms of choice back then were Jenkins and Travis. Then CircleCI eclipsed them in popularity, mainly due to seamless integration with GitHub. Then GitLab pioneered the idea of CI/CD bundled with VCS, and GitHub responded by introducing Actions. Both quickly became standard choices for organisations using GitHub or GitLab respectively for code hosting.
Re-using existing CI infrastructure
Consider a typical CI/CD scenario with Atlantis. You make a change in a Terraform file and create a pull request in GitHub. You soon see the plan output appended as a comment in a PR. It looks good; so you comment atlantis apply
- and it gets applied. Awesome!
But where did terraform actually run? There aren’t any new jobs in Actions.
Atlantis runs the terraform binary on the same VM it is deployed to. In case something goes wrong, you can stream logs or ssh into the box. But running jobs as processes on a single VM has its limitations. They share the same file system and environment dependencies, which can lead to failures and other unpredictable behaviours. For this reason modern CI platforms moved to spinning up a fresh Docker container for every job. Lyft team has forked Atlantis to mitigate the issue by using Temporal for execution state.
Other TACOs are taking it one step further and provide a “full CI” experience — a web UI with jobs, sub-steps, logs, etc. As they’re cloud-based, you can also use private workers — so that your cloud secrets and sensitive data never leave your network. For the same reason Github Actions allows to use private runners.
The central idea of Digger is to avoid duplicating the CI infrastructure with compute, jobs and logs just because of terraform state. Whether you self-host Atlantis or use a cloud-based TACO, you still effectively end up with 2 CI platforms — one for terraform, the other for all other code. But with Digger, the terraform binary runs natively in your CI. Jobs and logs are re-used. Digger is aware of terraform state and acts as an orchestrator — it can spin up multiple CI jobs in parallel for unrelated plans, or queue up applies that could clash with each other.
Security
Since Digger does not actually run the terraform binary — it just orchestrates CI jobs — it does not need access to your cloud account or sensitive data in the terraform state. This is between your CI and your cloud provider. Digger is aware of state — as in it knows which state file each job is going to run against — but it has no way of knowing what is in that state file. Moreover, you can use OIDC to grant your CI provider access to your cloud account without sharing cloud access secrets with any third party.
No need to maintain a server
While you can self-host a backend of Digger (a backend is required for some features like parallel runs), there is little reason to do so. Because Digger is inherently more secure (see above), you can just use the cloud instance of Digger for job orchestration. It does not receive any sensitive data like cloud account secrets or state. Digger backend just acts as a “gate” to start jobs in your CI. Also, the most basic plan-apply functionality on Github Actions works without any backend.
Costs
Organisations using Atlantis often end up hosting multiple instances of it to manage different environments for security reasons. And because Atlantis runs terraform jobs on the same VM, the instances need to be relatively powerful, it’s not just a lightweight API. Digger on the other hand does not need its own compute to run jobs. It reuses the scalable compute pool of your CI. And even if you are self-hosting Digger backend, the smallest container is likely enough.
Apply on merge
By default Digger behaves just like Atlantis — it allows applying before merge, and uses PR-level locks to mitigate the risk of state conflicts. But this is not the only way (see merge-apply dilemma) and many teams prefer the more “pure” way of applying only after merging. This is somewhat against Atlantis philosophy (issue); Digger supports both ways.
RBAC via Open Policy Agent
Atlantis has --gh-team-allowlist
server option to specify GitHub roles allowed to plan and apply (docs). This means that to update access, you’d need to manually restart the Atlantis server with a new list.
In Digger RBAC is implemented using OPA policies (docs). So policy changes are decoupled from other configuration and you can update them anytime without needing to restart the server.
So… better than Atlantis?
In the aspects I hand-picked above, yes — otherwise we wouldn’t have built Digger the way we did. We genuinely believe that re-using existing CI infrastructure is preferable for most teams. But not for all teams. Below are some of the reasons to choose Atlantis over Digger:
- Established, battle tested tool — the Atlantis project was started back in 2015
- Bigger community — the Atlantis community has over 6000 members
- Portability — Atlantis is simple monolithic server, doesn’t even need a database
- Learning opportunity — By self-hosting, maintaining and troubleshooting Atlantis you’ll learn a lot of the lesser-known nuance about terraform and other devops tooling
Digger is an open-source alternative to Terraform Cloud and other TACOs