An Introduction to DevOps
[Article 2 of 6 in series Software Delivery, DevOps and CICD for the uninitiated]
In this article, we will cover:
- The meaning of the term DevOps
- The motivations for the DevOps movement
- Some established solutions proposed by the DevOps movement
Introduction
In order to appreciate the tools and processes we will adopt in the software delivery pipeline we’ll define in subsequent articles, it’s first important to be conscious of some of the common pitfalls that we want to avoid.
You may have heard the term ‘DevOps’ which is a movement that has emerged to solve many such issues, but what does it mean?
“DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity.”
Credit: Amazon
Disappointing, right? The goal of this article is to instead focus on some of the key issues which motivated the ‘DevOps’ movement and some established solutions it offers for overcoming these.
Why DevOps?
The DevOps movement has emerged as a set of solutions to a set of recurring problems which limit all manners of organisations’ abilities to deliver and to grow, characterised by the fact that they are attributed principally to the organisations’ Development and Operations departments having isolated approaches. Understanding DevOps really involves understanding these common problems and recognising solutions to them.
The complete list of problems is truly as long and as messy as a piece of tatty old string. Below I’ve highlighted just a few and their associated solutions:
A. Early/Continuous Involvement of Ops in Dev, and Dev in Ops
In many organisations, the delivery cycle goes a little like this:
- Sales team promise too much too soon
- Development team are forced to build frantically to meet the project’s functional requirements in time
- Operations team receive a half-finished product that they have had no input on, with no documentation
As a result, the product is highly likely to:
- require significant time investments e.g. to deploy, to test, to run, to fix and to support
- not meet performance criteria e.g. response time, expected volume etc.
A DevOps approach breaks down this barrier by advocating a delivery framework which:
- composes delivery teams out of both development and operations focused staff and tackle both at the same time, rather than having separate departments/phases
- breaks larger changes into smaller ones which are released more frequently, with less risk
- considers functional increments ‘done’ only once all the enhancements necessary to support that change operationally, as well as all tests and supporting documentation are also completed
For more information, see:
- Agile — a project delivery framework (often contrasted with ‘Waterfall’)
- Scrum — an Agile-compliant Software Development methodology
B. Early and Constant Issue Identification and Resolution
In many organisations, most issues are only identified when they are reported by customers or users:
- the testing undertaken during the development and release process is deficient: code bugs evade capture and make it through to production
- the monitoring and alerts on the production is barely existent, meaning operational issues go unnoticed
On top of this, once an issue is identified:
- the tools available to investigators make it difficult to detect the issue or reproduce the scenario.
- the processes facilitating issue prioritisation, cause identification, fix implementation and customer communication is opaque and disjointed
The likelihood of issues, the sluggish speed of fixing them and the lack of information available for developers and for users creates frustration for end users and puts pressure on delivery teams.
In an alternative setup:
- testing is automated, thorough and constantly repeated during the development/release processes and on an ongoing basis
- monitoring tools automatically detect key events, and assess performance against the organisation’s Service Line Agreements (SLAs) and Key Performance Indicators (KPIs). The system’s current state/performance, as well as any issues are accessible in user friendly dashboard and communications go out to them whenever services are unavailable.
- the right tools are in place for the team to quickly setup a test environment, replicate the issue or run preemptive scenarios, identify problems and investigate potential solutions.
When issues do occur, customers appreciate the transparency and honesty, engendering loyalty, whilst support teams are empowered to
C. Minimising downtime
‘Downtime’ is the period of time for which a service is unavailable. Organisations aim to eradicate or minimise their downtime.
Downtime caused by bugs, system crashes or network faults are unpredictable but operations departments guard against it by adding ‘redundancy’ to their solution i.e. exact backup ‘copies’ of the whole system which are always ready to take over should something go wrong.
Another potential source of downtime is releases: most upgrades follow the basic pattern of shut down-replace-restart and the system is unavailable throughout this time. The tried and trusted ‘blue/green deployment’ remedy applied by operations teams invokes redundancy similarly.
The organisation runs two identical copies (‘blue’ and ‘green’) of the system with all activity directed to just one (e.g. to ‘blue’) and the other dormant (‘green’). The upgrade is carried out and then tested out on the ‘green’ system, while ‘blue’ continues running the previous version and handling all traffic. When the deployment is complete and verified, the organisation switches its routing of traffic (this could be immediate, or gradual) so that eventually ‘green’ is handling all traffic and ‘blue’ is now dormant. Once ‘green’ has proved to be stable for a period of time, ‘blue’ is finally upgraded to be ready for the next version.
The quest for ‘zero downtime’ is most pertinent to production environments, where end users would otherwise be affected, but it can apply also to test environments where it can be important to avoid disruption to test activities.
Unfortunately, releases don’t always go to plan, and sometimes an issue with a release emerges after it has been rolled out (often because it’s only then when the application is exposed to the widest number of environments and scenarios). When this happens, the organisation’s return on investment for the release is at steak, as well as their reputation.
Organisations want to restore normal service as quickly as possible, and can go in two directions:
- ‘roll back’
i.e. divert traffic back to the ‘blue’ server if it is still running the previous version, or else redeploy the last stable version - ‘roll forward’
i.e. issue a ‘hotfix’ or ‘patch’, or quickly prepare a new release which fixes the issue
A roll forward can allows the release’s new features to be realised, but must be safe and effective, as a rash change could make the situation worse. Needless to say, this level of heightened, time critical analysis and decision making requires well drilled processes that facilitate close cooperation from Operations and Development.
Issue resolution would also be helped out by the following process…
D. Automatic, Continuous Delivery
There are many reasons why it is necessary to (re)deploy software e.g.:
- new feature added
- issue identified and resolved
- hardware upgrade or migration required
- adding resource for expanding demand, geographies or for redundancy and backup
Many organisations deploy and test their infrastructure and software manually: with individuals following instruction manuals involving hundreds of steps. This:
- takes a lot of time from skilled resource
- is error prone or open to interpretation and therefore high risk
Such organisations are unlikely to release updates more than a handful of times a year. The lack of speed renders them unable to react quickly to changing business opportunities or threats.
If on the other hand, deployment of infrastructure and applications is automated, as well as the testing thereafter, organisations can release hyper frequently. Business agility yields results: Facebook attributed their success so heavily to their ability to move quickly that their company motto was once “Move fast and break things”. (Sensibly, they’ve since addressed the risk aspect too and changed their motto to the somewhat less punchy “Move fast with stable infrastructure”.)
The tools and processes involved in supporting this Continuous Delivery will be the focus of the next few articles in the series.
Summary
In this article we have covered the meaning of DevOps, the motivation for it and some key processes that can be implemented. The common theme permeating through these solutions is the bringing together of the traditionally separate worlds of Development and Operations.
By giving Development teams the skills, tools and mandates to bring Operational thinking into their work, and by investing in the automation of traditionally Operations processes, organisations can cease diverting endless amounts of resource into keeping the lights on and instead keep teams focused on initiatives that will facilitate strategic growth.