The Risk Management Paradox: Enable Your Teams By Giving Up Control
By Brett Luckabaugh, Account Principal, Contino
Everyone is familiar with the phrase “if you love something, let it go. If it comes back to you, it’s yours forever. If it doesn’t, then it was never meant to be.” IT Risk can be a lot like that, it sounds crazy but hear me out for a second. Organizations today are desperate to get to a better place in their IT shops, by adopting Agile, DevOps, utilizing Big Data and ML/AI. Most of these companies barely dip their toe into the depths of each of these concepts, while leadership touts their “modern IT practices” to anyone who will listen. The good news is that most professionals understand the generic direction to take an organization, but almost no one knows the details of how. This blog is going to go into one aspect of that how: managing risk and enabling teams.
Adopting modern IT industry best practices is an exercise in enabling teams. To do this, you must maximize the ownership and control of the things that are within that team’s perview. Many teams today have very little ownership and control over their day-to-day.
Here is an exhaustive list of the things many development teams fully own and control:
- When to take coffee breaks
Things development teams do not own and control:
- What to develop
- The timeline
- The budget
- The tech stack
- Architecture
- The development tool set
- Infrastructure
- Dev/test/QA/prod environments
- Code pipelines/Continuous Delivery
- Data layer/log management/reporting
- Security controls
- Operations/maintenance
It seems fairly simple to define the narrative of how we got to this state. In an effort to grow their IT capabilities, companies had to hire a lot of people. Some of those people will naturally not be as competent as they should be, and as a result, bad things happen. Unknowing executives, in an effort to save budget, traditionally throw development tasks to outsourced third-parties that crank out code with just enough quality to run — for now. Months or years down the line, something inevitably causes an emergency, such as a critical security vulnerability, extremely antiquated environments (just how many Java 1.4 environments are out there, exactly?), desire for new features on a tech stack that no one knows how to deal with and many more! Now, part of this problem I blame on finance, which treats everything in the IT space as a project and not a product (I recommend the book Project to Product by Mik Kersten), and as such, the main objective is to produce a working deliverable and I don’t have to worry about the maintenance or ops of that thing because by the time it breaks, we’ll all be long gone on a different project, making the same mistakes over and over again. Then, naturally, IT gets a bad rap since the optics make it seem as though everything in their wheelhouse is poor quality and breaks all the time. So, obviously, we have to put controls in place to ensure these emergencies stop happening!
Solving Quality with Policy
Here comes the policy parade. We’ll spend months creating sound, foolproof documents on how “vendor A must meet requirement X,” and “developers can never have any access to any production environment,” and “the tech stack will be standard across every team so our maintenance team can be cohesive and deal with only one type of problem.” Before you know it, you have silos engrained across the entire org and employee fulfillment and happiness breaks through the floor and into the sub basement. At least everyone can show off to the CIO about how we all have a handle on the situation though, right? Except whoops, it never works out quite that way, for multiple reasons:
- Vendors will either lie about their ability to meet requirements, or do the absolute bare minimum in those areas that are visible to their clients to prove compliance and end delivery may or may not actually be compliant. After an initial review and signoff by the compliance teams, they will be free to do whatever they want.
- There is usually no procedure in place to perform any auditing — nothing is revisited.
- Tollgate processes aren’t staffed with people who are competent with what they’re being asked to review (policy people reviewing highly technical deliverables).
- Standard, shared environments have multiple teams with different objectives all hitting them at the same time, causing environment drift, fracturing and outright destruction.
- Developers are less efficient with the standard tool sets rather than using what they are familiar with.
- Project lead time shoots through the roof. Everything must be passed off to the silo owner. Change of a database field? Lead time. Passing off to QA for validation? Lead time. Bringing changes up to a change control board and getting signoff? Lead time.
- Code rollouts are all bundled up and shared across multiple development teams, ensuring that there will be merge conflicts, massive regression test suites and long nights of pushing code to production and sitting there for hours to ensure nothing goes sideways.
At this point, I’ve hopefully gotten at least a few head nods out of my readers. Much of this is a standard cancer that infests these “robust, secure” organizations. So robust, so secure in fact, that little gets done. Risk teams are often not risk averse, they are risk nullifying. I often joke that we should be so risk averse that we go take an axe to the datacenter power lines. At least that way we can’t be hacked.
How To Get Started
To get out of this hole, something has to give, and realistically the only possible thing that could give is risk. That means some measure of control has to be given up by centralized IT and handed back to the developers. Yes, that means that developers could potentially break something. Yes, that means that technically a developer could install something the company didn’t have a license for, creating legal risk. This means that you must empower your developers to such a degree that they can potentially cause harm. If that scares you so far out of your wits that you couldn’t imagine such a future, then maybe you need to take a hard look at the quality of developers you’re contracting with or employing. Strong developer empowerment cannot happen when developers are not worthy of that empowerment, and for the longest time developers have been treated as throw-away products, simply code monkeys to take the requirements of the actual smart people (your architects, security professionals, etc.) and then churn out the “stuff that makes the app go.” We come to a bit of a dilemma here, because I don’t assume it is even remotely possible for an IT org stuck in this type of reality to pull itself out by removing all poor performers, then finding the budget to hire absolute top talent, thereby mitigating the risk of developer empowering. To practically address this problem, we need to first take a look at “what is possible to lose control of?”
It is not wise to completely remove all semblance of centralized control overnight; your development teams — and even the management layer on top of them — would have no idea how to handle that much responsibility all at once. Therefore, a nuanced approach is required. Furthermore, it will depend on the capabilities of your org to perform the necessary auditing requirements to ensure that enablement doesn’t run away from any semblance of process and control. Here’s an example of a good way to handle enabling a team:
Let’s take a competent development team that has no control over the dev suite they’re dealing with. Perhaps the organization has specified Eclipse as the mandated IDE, even though a substantial portion of the team is more familiar with IntelliJ. While the licenses for IntelliJ cost money, it’s a massive sign of goodwill for an organization to reach out and offer to pay for a tool that the developers really enjoy using. This would measurably increase their productivity and work happiness. The risk here is extremely low because the code output of Eclipse and IntelliJ is not demonstrably different.
Another example that actually does increase risk, is allowing a development team to own and control their own development, and even test environments. While you can simply put down the new mandate that teams are now in control of their own environments, doing that and nothing else increases your risk factor more than is acceptable. For instance, now you run the real risk of having so significant of environment drift across teams that the practice becomes untenable. To mitigate this risk, it is important to not only have policy in place (“anyone who opens port 22 to the public internet will be fired”), but have some form of automated (or at the very least, manual) auditing procedures in place to check that these policies are actually adhered to. There are tons of ways to automate environmental policy. The reason tollgate owners are so risk averse is because it’s their butts on the line if they let something through that shouldn’t be, so the default behavior is to say “no” to everything. This is incentivizing the wrong behavior. Policy admins should not be permitted to say no to proposals, they should rather be empowered to determine the best way to ensure adherence to policy via automation and tools.
Some Examples
Let’s take that list of things developers don’t control and go through some quick examples of how an organization could potentially address each item:
What to develop
- Bring developers into the ideation phase, give them a voice. Change the way the “what to develop” comes about. Get ideas from developers, and put them in charge of getting data about their applications that will help guide these decisions.
The timeline
- Trust that developers are telling you the truth when they tell you how long something is going to take; if everyone is saying it’s going to be 9 months to crank something out, and leadership has given you a two-week challenge, it’s time to have a chat about expectations.
The budget
- Give developers a financial “andon cord” that allows them to voice concerns about the budget, and allow them to suggest other vectors of how to handle the problem. Do your developers have a say in potential ways to save money? By challenging the current direction on infrastructure and architecture is there an ability to mitigate cost?
The tech stack
- If a team just really, really likes Go, determine if there is a way to actually allow for this. Is this team going to own the maintenance of the project/product going forward? (hint: they should) If everyone got hit by a truck tomorrow, could you hire more Go experts?
The development tool set
- Give devs the ability to pick their own development tools. If the team wants to utilize Docker on Macbooks, then get them Macbooks and let them install Docker. New tools can absolutely go through a vetting process where things like tool licensing and risk can be managed. Doing this also has the added benefit of encouraging your developers to learn about new technologies, which could potentially be adopted by the rest of the organization.
Infrastructure and architecture
- Modern DevOps is not highly paid senior architects dictating how an application should function, it requires collaboration at all levels. This isn’t to say every developer should be involved in every strategy session, rather that there must be a method to collect feedback, a path to address concerns, as well as empowerment of the development teams to provide input into architectural decisions. As your organization gradually adopts a DevOps mindset, certain teams will be able to show themselves as masters over their own domain, and are therefore enabled to make key infrastructure and platform decisions. I.e.: instead of EC2 instances, we’ll be utilizing Lambda, SQS, and DynamoDB for everything, thanks infrastructure team, now you have no ops to worry about.
Dev/test/QA/prod environments
- Standardizing environments does not mean what most people think it means. To standardize an environment, you should be able to completely destroy/terminate/delete an entire environment, utilize an automated job to create a new one and configure it with everything that is needed within minutes. Once you get this in place, you can turn the development teams over to their own environments, knowing that they all mirror each other. This also has the added benefit of allowing for timed automated destruction of environments, ensuring that no server hugging is going on and potentially cost savings by spinning everything down after hours. Teams should also have the ability to muck around in their non-prod environments, with the expectation that any critical changes go back to whatever team is managing the environment templates for discussion, approval and adoption.
Code pipelines/Continuous Delivery
- If a team owns their own environments, it only makes sense that they would own their own pipelines to those environments, right?
Data layer/log management/reporting
- Teams should have agreed upon SLAs for communicating with other systems. By loosely coupling the various systems, and each team being treated as a black box (I give you this request, you send me a certain response), we allow each team to do whatever crazy middleware they want. Where possible, data should not be siloed, it should be generated in a team, the team controls that data and if it is needed elsewhere it can then be shipped to a centralized location.
Security controls
- As mentioned earlier, while teams should be enabled to make bone-headed moves, those bone-headed moves must be immediately caught and remedied by automated monitoring and alerting platforms. This is a paradigm shift in the way your security/DevSecOps organization works. The security org must not only make policy, they must have mechanisms in place to automatically detect things that are outside of that policy and have procedures in place for remediation.
Operations/maintenance
- Ultimately, an enabled team owns their own deliverables. This forces your teams to eat their own dog food and encourages them to produce clean, defect-free code, since ultimately they’ll be the ones called to fix an issue at 1AM.
Conclusion
Every time you give ownership of something to a team, you lose traditional centralized control. Most organizations are so caught up in losing control that they don’t realize is that if they changed the way they look at control, they’d find that they would actually have more control, rather than less. By enabling your development teams in an iterative fashion, coupled by changing the way your compliance teams operate, you can provide significant autonomy to your entire org and realize an absolutely unfathomable amount of increased productivity out of your teams.
Originally published at www.contino.io on February 12, 2019.