Engineering Processes
Engineering teams like to move fast. Multiple products with a variety of projects, deployments happen almost every day. Ensuring that the chaos does not take over sanity and lead to multiple failures requires carefully thought out processes. And while most of the teams have processes in place — right from the dev stage to the production stage — ensuring that the processes are followed becomes a challenge.
There are checklists in place before a new update is pushed to the production stage. Following the checklist mitigates the chances of things going wrong and establishes that basic hygiene factors were taken care of.
Types of Deployment
Deployment usually falls under two buckets:
- Infrastructure Change — includes things like NAT change, VPN Change, Backup, Maintenance, adding more infrastructure etc.
- Code Change — includes bug fixes, feature updates and new additions to the product code base.
The Challenge: Ownership and Accountability
Each type of deployment happens under the purview of a stakeholder. Product changes are usually presided over by Product Owners whereas Infrastructure update happens under the purview of the Infrastructure Stakeholder.
The final accountability of the updates lie with the stakeholders. But because the teams are big, and communication can be tedious — there can be cases when the stakeholder is not completely aware of a new update going to production.
And if there are multiple stakeholders involved for a certain update — everyone needs to be equally accountable for a certain update. Emails and messages can be ‘missed out’.
The Solution
To ensure that no update goes out without authorisation from the Stakeholders, we devised and created a 2FA token process built into each deployment. What this means is that without entering the correct tokens — which are only known to the Stakeholders of the project — the deployment would not go ahead.
Every project has a unique set of tokens. These tokens should be the same within a single stakeholder group. So if there are 3 people acting as project managers for a certain update — either of them could authenticate the deployment by sharing the token. If there are multiple stakeholder groups for a certain project — each group would have a unique token which would be needed to run the deployment process. For example, a deployment that falls under the purview of both the product as well as the infrastructure team would need tokens from both the stakeholder groups to go ahead.
How Does it Work
Tesselate is used for Infrastructure Deployments and Nomad is used for product changes. We configured Tesselate to ask for a token from the Infrastructure stakeholder before any update could be pushed to production, and we configured Nomad to ask for a token from both Product Stakeholders as well as Infrastructure Stakeholders to push a software update.
For every new project, the Stakeholder group generates a QR code and shares it amongst themselves. Using a regular 2FA token app, something like a Google Authenticator or Authy, the QR code is scanned and tokens for the project are generated.
These tokens are only known to the Stakeholder groups. Whenever a change needs to be deployed, the engineers pushing the update to production need to obtain the token from the stakeholders, enter it and only then can the deployment go ahead. The Stakeholders before handing out the token can then ensure that the hygiene checklist was followed by the engineers on call, thereby reducing the chances of downtime and unforeseen errors on production.
Result
Downtime was a common phenomenon after a deployment. Something or the other would crash, causing on-call engineers great frenzy. Since the implementation of the 2FA token requirement, the system has had 0 downtime.
Projects/Codes:
Generate QR Codes via CLI that can be shared within a Stakeholder group.
Tessellate Changes:
Nomad-Proxy
Sample configurations:
Tessellate Command:
./tessellate — totp-config config.json
Nomad Command:
./nomad_serverproxy — totp-config config.json
2FA config is a JSON hash that has the following structure.
json
{
"Namespace/Ident": {
"Operation": ["array", "of", "tokens"]
}
}
Example: If you wish to Protect Nomad Job called “hello-world”:
- Allow Get
- A new deployment should have two stakeholders’s OTP
- Stop a Job could do with just one
The totp config would look like:
json
{
"hello-world": {
"GET": [],
"POST": ["GA4DGMQ4TKZMFTDSNBUDEMZYMYYA", "GA4DGMQ4TKZMdedeqeSNBUDEMZYMYYA"],
"DELETE": ["GA4DGMQ4TKZMFTDSNBUDEMZYMYYA"]
}
}
Verification library is independent of Nomad and Tessellate and can be utilized for any purpose which can leverage the same configuration structure.
The library is open-source and available at