DevOps and Segregation of Duties

Engineers at Macquarie

Published in

Macquarie Engineering Blog

9 min readMay 13, 2019

How you can make them work together.

by Jeehad Jebeile, Agile Coach at Macquarie Group.

DevOps and Segregation of Duties (a.k.a Separation of Duties) or SoD are not usually spoken about in the same sentence. DevOps is all about removing barriers and minimizing hand-offs, while segregation of duties is about adding gates to minimize risk.

When working in highly regulated industries, as we do here at Macquarie; moving teams towards a DevOps way of working can by quite challenging. This is due to the fact that regulators want assurances that only requested, approved and fully tested changes make it to production. In these situations the main control that is used to give this assurance is in fact Segregation of Duties.

Segregation of Duties

Image Credit — https://www.flickr.com/photos/riskexposed/6533848529

So what is Segregation of Duties?

Separation of duties (SoD; also known as Segregation of Duties) is the concept of having more than one person required to complete a task. In business the separation by sharing of more than one individual in one single task is an internal control intended to prevent fraud and error. — Wikipedia

In the software engineering world, this basically means the person (or team) who developed the code cannot approve or deploy the code. Again, to prevent the accidental or malicious release of unauthorized code into production.

In contrast, DevOps is about bringing together (or merging) the two once discrete functions of Development and Operations into one. By having the same team that develops and tests the code, also support the operation of the code in production. Segregation (by definition) is the complete opposite of merging. Yet, this is still one of the most common practices to control what can or cannot be promoted to production.

What are the drawbacks of Segregation of Duties in DevOps teams?

SoD can slow teams down by adding unnecessary hand-offs and has the potential to introduce errors. Every time a hand-off occurs, a transfer of information needs to occur, which not only slows things down but can also introduce a Chinese Whispers effect. Hand-offs not only impact the deployment of changes, but also responses to incidents that occur in production. In this scenario, who better to respond to an incident than the person (or team) responsible for the change.
Shows lack of trust to teams which nurtures a culture of fear. DevOps teams need to be autonomous in order to get the full value and speed benefits that the practice preaches. In order to achieve this autonomy, the team needs to be trusted to do the right thing at all times. And in times when they do the wrong thing, they will be expected to take the responsibility of rectifying the error. Remember, with great power comes great responsibility (thanks Uncle Ben, R.I.P).
SoD cannot address concerns relating to collusion. This is in regards to deliberately pushing unauthorized changes into production, whether for malicious intent or an attempt to undermine the process because of urgency. For example, “we didn’t have time to do full regression testing because we need to release tomorrow, can you please sign this off anyway?”. No matter how many controls are put in place, you cannot escape this issue.

What to do if you have no choice but to implement SoD?

As mentioned in the introduction, there are some highly regulated industries that will not allow DevOps teams to work completely autonomously. Here are some of the things your team should focus on if you find yourself in this situation.

Minimize the number of hand-offs

One of the primary principles of DevOps is to the minimize the number of hand-offs to perform a task. In DevOps, the focus is generally with the SDLC (i.e. CI/CD) but minimizing hand-offs can be applied to most processes in your team. Initially, you need to evaluate the existing process by first confirming that it is still relevant. Subsequently, you can look at streamlining the process to have the minimum number of steps required to achieve the outcome. Once defined, you can look at automation. But remember, even though automation is important and generally a good idea, there’s no point in automating a poor process, or a redundant one. Once you confirm that a process is required; optimize it, then automate.

Automation

Invest in automation for your builds, testing and deployments. By automating your delivery pipeline, you minimize the risk of human error.

Continuous Integration vs Continuous Delivery vs Continuous Deployment

Continuous Integration is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build and unit testing process, allowing teams to detect problems early.

Continuous Delivery is the natural extension of Continuous Integration: an approach in which teams ensure that every change to the system is releasable, and that we can release any version at the push of a button. Continuous Delivery aims to make releases boring, so we can deliver frequently and get fast feedback on what users care about.

Continuous Deployment further extends Continuous Delivery by automatically performing the deployment of a release of code to Production as soon as it is ready.

Even if the final production deploy needs to be completed by another person or team, that is, you can only achieve Continuous Delivery, at least you can get to a production-ready state in the fastest possible time.

Remove External Dependencies

Remove the dependency on external teams and attempt to keep all sign-offs within the team. Having external dependencies such as needing an approval from a particular stakeholder, or having a separate team performing the deployment slows the process down. This is not to say that these actions should be skipped, or ignored, but they should be able to be done by members of the team. Again, going back to DevOps principles, teams should comprise of full-stack, or T-shaped developers who can wear multiple hats when required to perform all tasks required of the team.

The main reason to remove these external ties is that it is generally easier to get access to people within the team than those that are outside of the team. Also, those within the team will know what’s going on and what’s required without needing additional explanation or context. Implementation of both of these will see improvement in your team’s velocity.

If you cannot remove this dependency, see if you can bring it into your team instead. For example, in our Leasing business, all production releases required documented business and technical approval prior to deployment. Business approval was performed by someone from “the business”, and technical approval was performed by our Operations Lead. This was changed to have our Product Owner perform approval on behalf of the business and our Tech Lead to provide technical approval. Both of whom exist within the delivery team. This small change allows SoD to remain intact (since the PO generally is not directly involved in the development of a feature) and the team is no longer reliant on someone outside of their team to deliver value.

Safety Nets over Controls

One of the best ways to ensure you maintain a high level of risk aversion but still have the ability to minimize hand-offs and deliver quickly, is to implement safety nets over controls. What does this mean?

In the circus, when the trapeze artists perform their acrobatic feats, generally they do so over a safety net. The alternative, would be to have them hooked up to safety lines, which obviously would restrict their movement. A safety net, on the other hand, allows them to perform as fluidly and efficiently as they can with the reassurance that if something does goes wrong and they happen to fall, they fall safely into the net and avoid serious injury.

Don’t get me wrong. There is definitely a time and a place where controls are required. For example, when flight checks are performed prior to take-off. This is (and should) be mandatory before every flight because we cannot afford to have any major incident in this type of scenario, as the result of failure would be devastating. Not only would the monetary costs be high (planes aren’t cheap you know), but more importantly, the cost of life is priceless.

For most of us, we build software. And for the most part, if our software fails, it isn’t a matter of life or death. So it’s OK to fail in these scenarios, as long as you can fail safe, recover quickly and learn from your failures so they’re not repeated again.

In regards to software development, safety nets include high quality system monitoring of production systems. This allows the team to understand the current state of the system and enables them to respond to an incident before the user does and potentially avoid an incident in the first place. In the event of a major incident, having the ability to restore service by rolling back quickly, e.g. via blue/green deployments is another safety net that can be applied.

Blue/Green Deployment

This technique is a well known (but under-utilized) cloud pattern used to minimize downtime for a release as well as provide a rapid way to rollback (i.e. safety net) if something goes wrong.

Martin Fowler (one of the original co-creators of the Agile Manifesto) explains it simply as this:

One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime. The blue-green deployment approach does this by ensuring you have two production environments, as identical as possible. At any time one of them, let’s say blue for the example, is live. As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment — the blue one is now idle.
Blue-green deployment also gives you a rapid way to rollback — if anything goes wrong you switch the router back to your blue environment.

The main point here is that instead of trying to block releases until they are perceived to be perfect, we allow for all releases to go into production and have a kill-switch mechanism to restore service rapidly if required. Again, this is about failing safe and quickly, rather than trying to avoid the (inevitable) failure in the first place.

To sum up,

Segregation of Duties should be a last resort when it comes to controls for software delivery. Unless failures can result in loss of life, or is specifically mandated by a regulator, Segregation of Duties should be avoided if you wish to gain the greatest benefits of using DevOps principles and practices. That being said, if you are forced to use SoD in your team, then implementing the following techniques will allow you to have a successful DevOps implementation, if not a perfect one.

Minimize the number of hand-offs: Optimize your process by minimizing the number of steps and people required.
Automation: Automate everything, but only after ensuring that what you’re automating is still relevant. And remember, Continuous Integration, Delivery and Deployment…these are your friends.
Remove external dependencies: It’s easier to gain access to those within your team, so remove or at least minimize your dependency on those outside.
Safety Nets over Controls: fail safe and fail fast, rather than trying to avoiding failure at all.