By Corey Gale, Engineering Manager, DevOps
For the DevOps team at GumGum, process has been constantly evolving. Two years ago we shifted towards an Agile methodology for managing all of our work, including interruptive support tasks. Our implementation of Agile borrowed some elements from Scrum (like daily stand-ups and sprint demos) and combined them with the flexibility of Kanban (like continuous delivery).
While this new bump in process rigor fixed a lot of problems like clarifying project priorities and encouraging early discussions about scope, it also introduced a lot of new overhead. It also created new expectations on our DevOps Engineers that, for a team that sometimes spends all of its time on unplanned work, were counterproductive.
In this post, I will discuss how we adapted our process along the way to fit the needs of our DevOps team, and the benefits we gained from doing so.
Our DevOps team
The DevOps team at GumGum (we’re hiring!) is currently composed of three engineers; a combination of remote and on-site personnel. We support three internal engineering teams at GumGum: backend, frontend and big data, which altogether is 40 engineers and over two dozen different systems. Some of these systems serve over 25M RPM and our main AWS account regularly runs more than 1000 EC2 instances. So as you can guess, we have our work cut out for us and anything we can do to be more efficient can make a huge difference.
Of course, maintaining this many systems can lead to a lot of unexpected surges in support work, which our workflow methodology needed to accommodate. This was our first issue with our Agile process: unplanned work. So much of DevOps’ work is unplanned, and Scrum methodology suggests time-consuming exercises like ticket replacement and ticket-splitting for scope changes.
Hack #1: expect the unexpected
After trying out these methods (like “one ticket in, one ticket out”), I realized we were spending too much time in Jira, which was a sign we needed to tweak something. At this point we had completed four two-week sprints, and all of our tasks were tracked in Jira. I analyzed the data for these completed sprints and categorized each ticket as “support” (interrupted) or “sprint” (planned) work. I then added up the story points for each category and realized, for the first time, the true load of support on the DevOps team: 5 story points per engineer per sprint.
From here, I took this number into consideration during sprint planning, in particular when calculating expected capacity per engineer for planned work. Here’s the formula I used:
Capacity_next_sprint = Capacity_avg_last_3_sprints — Support_avg
Protip: categorize your team’s tickets according to the type of work requested. If you pick your categories right, it should be easy to calculate the average time/effort spent on support.
Hack #2: adjust expectations
At this point we were planning for the unexpected and, for the most part, everyone was making their sprint commitments on-time. That’s when I learned about a new problem: process pressure. It was revealed to me during a routine 1:1 with one of my senior contributors that they felt increased stress towards the end of sprints due to looming incomplete tickets. This engineer happened to be spending a lot of time unblocking other engineers, a task that has great ROI for the company. But, because he wasn’t working on the sprint tickets he previously committed to, he felt pressured to work longer/harder to meet those commitments.
This wasn’t fair! To make sure this never happened again, I got his permission to discuss this issue as a team in the next sprint retrospective meeting where I set the expectation that incomplete sprint tickets can slide from one sprint to the next. Up until this point, we scrutinized every ticket that didn’t get completed during our sprint retrospectives. I realized that this process was doing more harm than good and decided to drop it from our sprint retrospective meetings. I also made it clear that it’s not important what didn’t get done, but rather what did get done.
These small changes in expectations had a very positive impact on team morale. In fact, for 2/3 engineers, they said this tweak significantly reduced their stress levels. And I sincerely believe that less stressed engineers are more productive and write less RCAs.
Hack #3: cut the meetings
Now that expectations were clarified, the next consistent complaint I received was about excessive meeting overhead. For every two week sprint, the entire DevOps team spent over 5 hours in meetings:
- Sync with internal customers (30 minutes)
- Backlog grooming (2 hours)
- Sprint start (1 hour)
- Sprint retrospective (1 hour)
- Sprint demos (1 hour)
In addition to a daily stand-up, this was a lot of interruptions. To fix this, I tweaked a few things:
- I made our sync with internal customers an asynchronous process. New work is now requested entirely via tickets or Slack conversations.
- My team lead and I took on the backlog grooming duties, releasing the remainder of the team from the bi-weekly meeting.
- Sprint start and sprint retrospective meetings combined into a single 1 hour meeting. This was accomplished by requiring all new tickets be fully scoped in terms of a “definition of done” prior to the meeting (completed asynchronously), which meant the meeting could be spent on discussing scope and level of effort and not writing ticket descriptions.
These small tweaks saved DevOps team members as much as 6 hours of meetings a month!
At this point, things are running smoothly! My team is happy with the overhead, and because we’re using the same tools (Jira, Slack) and process as other teams, there have been some added benefits:
- Reporting: I can now speak the same language when communicating DevOps efforts to upper management. Sprint metrics can also make things like asking for new hires much easier!
- Familiar ticketing process for your customers: engineers from other teams simply create new Jira tickets for work requests, like they do for every other team. To optimize this workflow and ensure the right information is collected, new mandatory ticket fields can be added.
- All the perks of Scrum: flexible process, backlog grooming, planning poker sessions, retrospectives, demos and familiar tools like Jira.
Summary & key takeaways
- Agile processes like Scrum can be adapted for DevOps teams.
- Measure your unplanned work, and expect at least that much unplanned work every sprint.
- Measure all the things, which is often easier because Scrum provides the tools (story points, velocity, etc).
- Get constant feedback. Trying new things is great but sometimes they can have unintended consequences like pressure to work overtime.
- Cut the meetings that don’t use the team dynamic. If you spend most of the meeting typing ticket descriptions, maybe that sort of thing can be completed asynchronously beforehand.
We’re always looking for new talent! View jobs.