CI/CD Ingredients for Success
The obvious and the not-so-obvious information about Continuous Integration and Continuous Delivery
Continuous Integration and Continuous Delivery (CI/CD) is a staple of any software house worth it's salt. It is a practice that has become so foundational in the industry that many of us can no longer imagine working without it. CI/CD is as basic as tests and every bit as useful, and yet, I’ve seen such a great variation of pipelines out there, some bad, some good, and some insane.
I’ve been working with CI/CD systems for several years now and in my time, I’ve amassed a plethora of my own failures. Failures that have resulted in a little wisdom and a lot of grey hairs.
In this piece, I’ll cover some basic capabilities that your pipeline should have. I’ll mark both the essentials and the luxuries, then we’ll talk about your non-functionals, namely speed, logging, and error handling.
Just what the hell is CI/CD?
Before we get into the meat of this piece, a quick overview. Back in the glory days of 1997, when Oasis were going strong and dressing like an undertaker was cool, Kent Beck and Ron Jeffries came out with Extreme Programming. XP listed out a set of practices, principles, and values that they believed would usher in a better world of software engineering. It was a sensational change.
While Continuous Integration (CI) had been proposed by Grady Booch in 1991, it was Kent and Ron that pushed it forward. In short, CI is taking all of the code and testing it constantly. The aim? To ensure the team isn’t drifting away from one another and to nip integration issues in the bud.
Continuous Delivery (CD) was popularised by Jez Humble and David Farley. This is a two-pronged principle:
- The code in the trunk (the master branch or the SVN trunk) should be deployable.
- Those deployments should be completely automated and should not require any manual steps.
Together, they’re awesome.
That’s it, CI/CD is constantly packaging, testing and easily deploying code into a production environment. This has the advantage of eliminating integration issues and mitigating risk in deployments by making them more frequent.
So, what should a pipeline do?
I’m glad you asked. Each of the features I have included here are necessary to realise the benefits of a CI/CD pipeline. I have tried to be exhaustive but expect some variation.
This one is obvious, it should run some tests against your code. Wanna know if your code is working? Run some damn tests.
It should be building some kind of deployable artifact. Again, obvious. You’re planning on deploying this, aren’t you? You need an artifact.
This one raises a few eyebrows, but having personally implemented this, I have decided it is going on the “must have” list. Code and artifact scanning provide indispensable feedback. OWasp can scan your dependencies. Tenable can scan your containers. Checkmarx can take a look at your code. The specific tool isn’t what is important. What matters is the capability. You need to know when a SQL injection vulnerability pops up. Your tests might pass but your code isn’t production ready unless it’s secure.
Don’t offload that responsibility onto an ops department or your infosec team. Own the problem and visualise it using automation. By baking security scanning into your pipeline, you’re generating vast streams of data that can be used in a crisis or in day-to-day debugging.
Deployment of Applications to ALL ENVIRONMENTS
I was once told that all deployments except production were automated because we couldn’t risk it for our production environments. Their deployments were so convoluted that they didn’t want to “risk” automating it. Deciding instead to leave it to the ever-reliable human mind.
Don’t shy away because you’re feeling nervous. If you’ve got a simple deployment process, automation is easy. If you’ve got a complex process, it’s an absolute necessity. There’s no way out: Automate, automate, automate.
Once you’ve got your deployment out, how do you know it’s working? One key ingredient to a robust pipeline is a sanity check. This can take the form of a smoke test or as something more rigorous. You might issue an HTTP call to your API and check it returns with a
2xx status code. Whatever — don’t just assume your deployment was peachy because your app has started up.
Regardless of whether your build has failed or succeeded, a notification should be issued by the pipeline. This should be sent to the medium that your engineers use most frequently. Don’t clog up people’s email inboxes — all they’ll do is make a rule and hide them away in a random folder. Push them through instant messaging platforms(and into any auditing solution) such as Slack or Microsoft Teams (sigh).
Good Error Messages
The best way to talk about good error messages is to start with an unclear error message:
Error: "nginx-ingress" has no deployed releases
This is obvious if you know about helm and have encountered this issue, alas, our error messages shouldn’t only be aimed at those who already know about the problem. It should be easily understood by everyone. So on the face of it, this message isn’t all that clear is it? I deployed my last release. It is deployed… I think. If you were to run
helm list you’d see something like:
NAME REVISION UPDATED STATUSnginx-ingress 6 Fri May 3 11:19:45 2019 FAILED
The problem with the error message is that it offers no context or instruction. These are two completely essential ingredients. What does this mean, what is the impact, and what should you do? With these in mind, a more useful error message is something like this:
Your application can not be deployed because it is in a failed state. You can verify this using the helm list command. You should rollback to a previous working revision, using the helm rollback command.
The first sentence here provides lays out the root cause: The application is in the wrong state. It also offers diagnostic assistance should you want to verify this issue yourself. Context!
The second portion of this error message offers instruction for remediation. Instead of following some ancient confluence document, the instructions are right there in the code. A good error message can remove the need for extensive tutorials and how-to guides. It can eliminate costly googling and create living, responsive documentation in the code that won’t go stale. It’s worth spending the time getting those right.
The last one is metrics. It is helpful to know how many deployments are failing, how many are succeeding, how long builds are taking, where the choke points are, etc. There is a wealth of information for your team to dig into and understand.
This data is the lifeblood of your team — the metrics represent your team’s ability to deliver software changes. If you can bring it to the surface you’re sitting on a goldmine of information.