Submerging a DevOps Team into an Agile Landscape

Kuba Jasko
Version 1
Published in
12 min readOct 20, 2021
Image from Pixabay

I’m not going to cover the meaning of DevOps or dive into the debate that DevOps should not be a role or team but a concept embedded into the organisation, I will merely recant my experience with the introduction of an agile approach into a “DevOps” team: what worked, what hurdles we overcame and some side observations.

Many of you are probably thinking that DevOps should not be a team, but rather a model that an organisation should embrace:

DevOps is the combination of cultural philosophies, practices, and tools…
- Taken from AWS’s explanation of DevOps

However, many organisations have these so-called DevOps teams. These will likely be a support team injected with new tooling, or possibly a platform team taking on the new branding. Either way, they exist and serve a vital purpose. My team was very much a Platform, Operations, and Tooling team all bundled into one. Anything that was not pure development was thrown our way. As the rest of the organisation grew to become more agile, anything that didn’t fit was absorbed by the “DevOps” team. Even DBAs managed to find their way into our team! Yes, we had all the shiny tools, but in essence, we were very much a general development and application support team. To understand the challenges and hurdles we faced you first need to understand some of the responsibilities and tasks of the team…

Duties of a “DevOps” team

If you have a DevOps team, I assume it is akin to what I have experienced and made up of 4 main aspects: Platform, Operations, Tooling and possibly build management. The source of work is also very varied: from issues raised on a service desk to big chunks of work as new applications are onboarded. I will quickly go over the types of tasks we were faced with.

Platform

A Platform team looks after the infrastructure of an organisation or application. They will likely use modern Configuration-as-code tools such as Terraform and Ansible to manage the estate and will likely also look after various applications such as VPNs, Bastions, CDNs and load balancers. They are tasked with maintaining, upgrading and provisioning new infrastructure.

Operations

An operations team will look after monitoring solutions to ensure things are running smoothly and relevant alerts and being raised. They may use and configure tools such as ELK, Grafana, Prometheus, or cloud solutions such as Cloudwatch. They are tasked with the maintenance and configuration of these solutions.

Tooling

This involves looking after the various tools needed for the Software Development lifecycle such as source code repository, build, deployment, wiki and issue tracking tools. For all of the tools, they are tasked with user management, upgrades, and ongoing maintenance.

Build Management

Within our team, we also looked after the applications to some extent. Ensuring they were built and available to be deployed to all environments. In a modern development world, you have decent build tools in place so this whole process is automated and easily managed by developers. However, for mainly historic reasons and due to some of the archaic applications having overly complex and outdated build processes, our team also dealt with lots of the application versioning and build processes.

Image from Pixabay

As you can see, work in the “DevOps” team was very varied and there was lots of it. Everyone had a dozen things on their plate (including some critical tasks that were only on their radar), people constantly pinging them and a service desk that was either not looked at or looked at by half the team at once. It felt like we were a team of Brents (for those of you who have read the Phoenix project) and a bottleneck for most of the processes we were involved in. That’s not to say that it was terrible. I still enjoyed my job and we still delivered work and kept the lights on. In fact, I doubt many people saw any problem at all.

Organisational changes

The introduction of new projects and management saw many changes. Not all of them were agile, some were merely organisational changes with some more welcome than others but most, in the end, beneficial.

Photo by Leon on Unsplash

The first changes were around work visibility, flow and planning. We started with setting up Kanban boards, first physical and then virtual in Jira. This helped shed light on all the tasks that were piling up on people. Just the visibility was enough to make a difference: everyone was trying to get all their tasks to the end and work was pushed back on or given to someone else if that person had too much on the board already. Eventually, this got down to the level of not doing anything unless you had a ticket and logging every minute spent on those tickets. This helped us see where all our effort was going, what processes we could improve, and created a far more transparent team. That mission-critical process that Joe Bloggs does every other day that no one else knows about? That was now documented and taken into planning and workload considerations.

The next big change was being assigned a Project Manager. The initial annoyance of having to explain your work and updates to one more person was quickly superseded by the benefits of having someone be that barrier between you and the onslaught of new work. In fact, every conversation with the PM was three conversations saved. Either from projects giving you work directly or having to give the projects your status updates. The PM would know who was working on what, when they were likely to finish, and what backlog tasks projects were waiting on. They also took the internal struggles of deciding what to work on to the projects themselves so the team could focus on delivery.

With the extra visibility, the breadth of work was thrown into the limelight. I’ve already gone into the numerous areas we managed, but seeing it clearly for the first time there were concerns. We had a team of just over 10 and there were lots of tasks that only one or two people could do. No one knew what to focus their upskilling on as there was too much to learn. Skills matrixes always looked daunting and bare. But some decisions were made to alleviate these issues. Firstly, any work which we were blockers for and that people could do themselves we farmed out. This involved various upskilling around the account but made lots of processes smoother and quicker. The team was then split into 2: one team to manage the operational and platform requirements of the applications, and another to manage everything else. In essence, splitting Dev|Ops down the middle and having half look after the live operations and another look after the dev estate. The split was not perfect, but it meant that instead of having to learn 20 different areas, each team had only 10 or so to learn. It was at this point that I became a team lead of one of these teams and decided to take things one step further by introducing a more scrum-style agile approach…

The road to Agile

All the operational changes were great for visibility into the work and transparency from external teams into when work might be completed. However, there was still not much structure to the work. Development teams around us were having their bitesize chunks of work in their fortnightly sprints, while our work seemed to stretch on endlessly. Someone could be working on something for 6 months with no end in sight. And despite the project work now being more organised, the Service Desk tickets were still a mess. People dipped in and out of them when the mood struck, meaning some very trivial tasks would stay on there for weeks while others may get looked at by multiple people. To solve these problems we took one step further into the agile world.

Photo by Pascal Habermann on Unsplash

The introduction of fortnightly sprints helped structure things even further and prevented those runaway issues. We already had daily standups (the usual; what is everyone working on, what are their blockers, e.t.c.), but it was hard to track things at a higher level and the sprint cycle helped us with that. At the start of every sprint, everyone was given tasks to complete. Then at the end of each sprint, there was more accountability. Not to foster a blame culture, but to better understand: If someone was supposed to have a task finished but didn’t, was it because other work drowned them out, did they need more training in a certain area, or did they need help from someone else on the team? With daily stand-ups, this stuff was easily missed as people would merely progress by themselves slowly. This meant that if someone was struggling with a task it was easier to assign them some time to upskill or to work on the task with someone else.

Another change we made was to have a designated Service Desk resource. As it turned out we usually had more than enough tickets raised to keep someone busy full-time. We rotated this every sprint, and even those who were not a fan of the service desk and their time on the rota still appreciated the process and the extra uninterrupted concentration they got for the rest of their work. As well as being in more control of the tickets, they were also more promptly looked at and none were missed. This had the side effect of being a great training asset too. Before, if someone did not know how to solve a ticket they ignored it. Now, as it was someone’s job to do them all, they sought help and completed them all, learning and being able to do them the next time.

Finally, we took on full retrospectives, sprint planning meetings and backlog management. The ScrumMaster role was split between the team lead and PM but we got by. With all the time tracking we were doing we had a better understanding of how long some tasks should take and how much we could get done in the next 2 weeks which gave us time to plan internal improvements.

It was not a seamless process but by the end, I think everyone agreed that things were better. Even those most dubious of the changes when they started. That’s not to say that we didn’t introduce a whole new set of issues…

New Problems

The new rigour and process flow achieved by a more agile structure was bliss. Everyone knew what they were working on, projects were being held at bay by a PM, and the control of WIP gave an aura of less chaos. However, there were now new issues due to the fact that that our team did not fit perfectly into the Agile world.

Image from Pixabay

One of the main issues we had was task estimation in our sprint planning meetings. As every agile team does, we took to planning poker to decide how long each task would take and used this as a basis for deciding how many tasks we can pull into the next sprint. Our first issue with this was that we started using hours instead of points. This worked great from a management point as it was easy to see how long something will take, and as we logged time in Jira it was easy to see how close we should be to finishing and how much we over/under-estimated. However, a task that might take a seasoned resource 1 hour might have taken a more junior resource all day or even longer. So we had to start deciding who would do the work before deciding how long it would take.

Secondly, even though we had split the team, the work was still very broad and of an unknown quantity. Most of the time the planning meetings seemed futile as only one person had ever done that task before, or worse still, the task was a complete unknown. For example: “Investigate why server X dies at least once a week”. A task like that could have been sorted in a few hours if the problem was purely missing swap space, but it could just as easily have taken weeks to eventually figure out that a kernel patch was needed. We did our best with these and for the uncommon tasks, time permitting, we had someone explain all the steps, while for the complete unknown we would put in some investigation time and raise new issues off the back of that initial work. This seemed to work well and our planned work and actual work usually married up (once we also took into account everyone needed 20% of their time for non-task related items such as managing emails).

Another issue we had was the size of the tasks. Before we moved to agile we had a task such as “Upgrade Tool Y”. A task like that could have taken a month if it was complex and touched a lot of areas. Tasks like that don’t work well in an agile world as we ended up finishing sprints closing close to 50% of the tasks we planned. We ended up needing to break down the larger tasks into smaller chunks and estimate those each individually. This took more time but eventually led again, to a clearer flow of work that both the team and management were happy with.

My Takeaways

Photo by Samuel Tresch on Unsplash
  • “DevOps is undefined and can therefore be a dumping ground”
    I have found that due to the very woolly definitions of DevOps, and the fact that any DevOps team already has very broad responsibilities, it can easily become the place where all work without a defined team can end up. Obviously, the work has to be done, and if you can’t create a new team or find a better fit, just be sure to document the work and ensure whoever pays the bills is aware of the extra responsibility. It may also be worth trying to divide up any “DevOps” teams if they are large enough.
  • “Agile can work great for a part-Service-Desk team”
    If you find yourself in a team that also has a service desk, I found that scrum can work really well with a rotating service desk person. If your team does not have any non-service-desk work then it probably wouldn’t work well but you could still use Kanban if you have some larger pieces of work.
  • “As well the term DevOps, DevOps tasks can be quite vague”
    It might have more to do with the project I was on than the team I was in, but quite often when tasks were raised no one knew how to solve them until they got stuck in. This might have something to do with the experience of a team, but even in an expert team, there will still be tasks that require a lot of investigating and could quite easily blow up in terms of time and complexity. Don’t try to hide the issue or shy away from it, but take it into consideration in planning meetings and allow investigation time before being able to place a more specific estimate on the amount of work required.
  • “The breadth of tasks in a DevOps team makes Agile harder”
    I have also been part of a scrum development team and I found that scrum is made harder in a “DevOps” team due to the breadth of tasks and knowledge required. That’s not to say that being a good developer is any easier than being a good DevOps engineer, it’s just a lot more specialised. However, I do think it is easier to be a junior developer than a junior DevOps engineer, simply due to the breadth of knowledge required. So wherever you can, try to reduce this breadth while still making sure you have at least 2 people that can do any given task.

All in all, whether it be a DevOps team, a Platform team, a Tooling team or some sort of developer support team, I found that our move to agile was greatly beneficial. I think without the initial team split allowing fewer people and responsibilities for the team it would have been a lot harder. So if you’re in a similar team and struggling with the flow of control of work, give Agile a try!

About The Author
Kuba Jasko is an AWS DevOps Engineer at Version 1.

--

--

Kuba Jasko
Version 1

An AWS Senior DevOps Engineer with a background in IT and a love of CI/CD, Automation, Cloud and Configuration Management.