Photo by Dzenina Lukac from Pexels

Reducing Engineering Burnout with Support Rotations

Jim Shields
YipitData Engineering
4 min readFeb 25, 2021

--

Key Takeaway

Even for small engineering teams with fewer uptime requirements, a business hours support rotation can:

  1. Reduce burnout and context switching
  2. Reduce overall support time
  3. Increase knowledge sharing

(Note: In this post, I use “on-call rotation” to mean 24/7 support, and “support rotation” to mean business hours support.)

Small Platform Team: Lots of Support

At YipitData, our small (~20 person) engineering team supports dozens of services and technologies, but very few have very high uptime requirements (i.e., above 99%). Because of the lower uptime requirements, we haven’t used on-call rotations. Instead, we’ve handled incidents and incoming requests in an ad hoc way, and it had worked well for us, until last year.

On our 3-person Infrastructure team, we are responsible for our core (non-data) AWS infrastructure and our internal platform (YAWS) for building apps. As a platform team with a good amount of infrastructure experience, we tend to receive many incoming questions, requests, and problems; we are also often involved in incidents.

2020: Even More Support

In 2020, incoming (support) work for the Infrastructure team increased significantly, driven both by our entire company going all-remote (smoothly!), and by problems from new business challenges. We had a few major pain points:

  1. Our stakeholders contacted us through too many channels, often reaching out to us individually. This made it difficult to manage.
  2. We didn’t have well-set expectations with our stakeholders, so we felt obligated to drop what we’re doing to solve their problems quickly.
  3. All 3 of us felt responsible for answering incoming requests and for fixing the root cause of issues, leading to support burnout and, often, crossed wires.

All of this led to an increase in aggregate support time and an increase in burnout, with no increase in value delivered to our stakeholders.

To address these problems, starting in summer 2020, the Infrastructure team started using a business hours support rotation. While the 24/7 on-call rotation is a fairly common industry practice where there are high uptime requirements, we wanted a solution that worked for our lower uptime requirements.

The goals of the support rotation were to reduce aggregate time spent on support across the Infrastructure team and to properly set expectations with our stakeholders, with the hope that achieving these goals would reduce support burnout.

Setting the Right Expectations

Here are the expectations that have worked for us. They may not work for your team, but may be a helpful start:

  1. Every team member (including the team lead) is on the rotation.
  2. All requests come through one channel, a centralized email for our team.
  3. Starting from our Monday team meeting, the next person takes over. The previous person uses their best judgment about any remaining tasks to transfer. The goal here is to reduce ongoing support tasks per person, so they’re not continuing to do support beyond their week.
  4. The expectation is business hours (roughly 9–5 EST, depends on the person) only, no weekends. We have a separate escalation procedure for non-business-hours critical issues, which have only happened rarely — once or twice a year, knock on wood — in our experience. This is not a 24–7 on-call rotation (again, this may not work for your team).
  5. If someone is on support, but will be OOO for whatever reason, they should find another person to cover.

Support Responsibilities

And the responsibilities of the support person are clearly defined:

  1. We get non-urgent incoming work/requests, you don’t have to address them urgently, but you should respond within a reasonable timeframe (<1 hour on business days). Always ask “Is this urgent?” “How urgent is this?”
  2. When you do see larger problems or patterns, open a ticket in our backlog or bring it up in our weekly meeting.
  3. You don’t have to answer the root cause or question, especially if it’s very involved, unless you are interested. You can provide resources (links, etc.) when asked a question, or redirect to a person with better context to help if you’re unsure.
  4. Incident response will sometimes involve Infrastructure. Your role is to answer questions / escalate access, not solve the root cause.
  5. Automated emails (e.g., AWS instance termination or version deprecation): handle if necessary, or forward to the appropriate team with context & actions.

Results of our Support Rotation (So Far)

The results so far have been very positive:

  1. More predictable support time: team members expect to get less long-term work done in a support week. That’s ok, because it means the other weeks are much better for deeper, longer-term work than they had been.
  2. Less support burnout: all of us are not constantly looking to answer every incoming question and too-frequently context switching.
  3. Better knowledge sharing: before, the team member with the most context would often jump in to solve every problem. Now, whoever is on support will naturally learn more about each problem by answering incoming questions or finding the internal expert.
  4. Reduction of overall support: The engineer on support is encouraged to find common patterns and fix them, experiment with long-term solutions, and draft proposals for the team to reduce overall support.

We’re optimistic this will continue to pay dividends for our team, and hope it can be a useful resource for other teams!

--

--