Our requirements from a deployment pipeline
Recently, we (ClearTax) started to look for a uniform, standardised deployment pipeline.
We have grown, and we now have applications written in several languages that interact with each other. We didn’t want each product team to reinvent the wheel — we wanted a common infrastructure platform that will take care of most of the problems faced by individual teams, enabling them to focus on the product.
There are a lot of options out there (Ansible! Chef! Jenkins 2!). It’s hard to make a decision when faced with so many options.
I thought it would be useful to first articulate what we actually wanted. This blog post is just about the features we wanted in a deployment pipeline — it’s not exhaustive, and it’s tailored for our needs — but I hope it’s useful for others too. I will explain our final platform choices in a future blog post.
Note: for sake of brevity, this is mostly a bulleted list of points (the outline). Please leave a comment if something is not clear.
Must Have: Fully Automated Deployments
- Deployment to staging should be automatic once you merge in a pull request to master.
- Production deployment should be 1-click (promote current staging to production) or automatable (eg: deploy latest staging to production once a day).
Must Have: Developer Servers
- Developers should be able to provision servers for any branch they are working on. The ability to deploy your current PR to a URL in order to let others play with it, run tests, etc.
- We should be able to do this ourselves, without jumping through hoops or approval requests.
- This should be as friction-free as possible.
Must Have: Immutable Deployments
- All deployments will be immutable.
- Immutable means that once an application is deployed to a server, any change will result in a new server being launched, and the old server being de-provisioned.
- Individual servers are throw-away. There is no local state.
- Deployments are deterministic and predictable.
Must Have: Zero downtime deployment strategy
- Deployment should not incur any downtime.
- Rough process for ensuring this: launch new servers with latest release, wait for servers to be healthy, switchover to new server cluster at load balancer level, gradually de-provision the older servers.
- Use blue-green style deployments for critical services, and where ability to roll-back quickly is crucial.
Must Have: Audit Trails
- Ability to see when deployments happened, who triggered a deployment, etc.
- Traceability is a must in case of any issues we run into.
Must Have: Easy to use, self-service nature
- You should not need to learn new syntax or new markup languages in order to deploy your branch to a URL.
- The deployment tool should have a UI (CUI or GUI) that is intuitive to use, and does not require learning arcane incantations.
- This is a requirement as we want everyone to be as self-sufficient as possible.
Must Have: ‘Platform’ layer that takes care of shared requirements
- New applications have all the basic requirements available by default: log aggregation, monitoring, alerting, etc.
- Default configuration of firewalls, security groups, etc made available to each new application.
Good To Have: Credential Management
- Code should never have credentials committed to it.
- The deployment pipeline should take care of giving credentials for the current environment (staging, production, etc) to the running application automatically.
- Credentials should not be in plain text, and not accessible to the developers.
Good To Have: Managing ‘Static’ Resources
- Ability to manage resources such as databases, load balancers, network configurations (i.e., AWS VPC settings, AWS Subnet Groups, etc) within the deployment pipeline itself.
- Self-service tools for provisioning new resources like this.
Good to Have: Windows Support
- Some of our applications and services are written in .NET.
- Need for using the same pipeline for deploying a service to either Windows or Linux.
Good to Have: Ability to deploy non-web services
- Not just HTTP: We need the ability to deploy queue consumers, background workers, other daemons, etc.
- Have the same level of monitoring, log aggregation, alerting, etc available to non-HTTP services.
Good to Have: Speedy deployments
- Deployments should be fast, and easy to do.
- It should not take too long for new code to be live. Avoid wasting time waiting for deployment to occur.
Next blog post (which I should get around to writing soon!) should talk about the different tools we evaluated, and what we ended up using.
Suggestions, thoughts? Please leave a comment.
I’m currently setting up a infrastructure / site reliability team for ClearTax. Please let me know if you’re interested! You can email me at ankit@cleartax.in