Designing and Managing Support Processes for Internal Platform Users

Tales from Platform Engineering Program Management

Guidewire Engineering Team
Guidewire Engineering Blog
5 min readMay 21, 2024

--

By: Umang Jain (Director, Program Management) and Yoganand Ghati (Senior Program Manager, Engineering)

This is the second in a series of blog posts about the on-prem to cloud journey that Guidewire is making and how program management in our cloud platform engineering team has been instrumental in enabling our teams and stakeholders to deliver consistently and predictably. If you haven’t read our first blog — Organizing Work for Simplicity and Improved Collaboration — we would encourage you to read that so you can understand the role of program manager within platform engineering here at Guidewire.

We do not intend to claim that “we have figured it all out” or “this is the way to make the journey and therefore every organization should subscribe to it.” Instead, we intend to share some of the problems we’ve faced and how we’ve solved them so that other program managers facing similar challenges don’t have to start from scratch. Additionally, we are approaching this blog from the perspective of wanting to learn from others as well. While reading this, if you think of alternative suggestions we could explore, we would love to hear about them in your comments. We are a team of adaptive individuals who invest in experimenting with new approaches and are open to new ideas.

Designing and Managing Support Processes

Context

Platform engineering teams exist to create and support internal developer platforms that enable application developers to deploy their applications to cloud infrastructure with the least cognitive load. Platform engineering teams achieve this via self-service APIs and supporting documentation designed to enhance developer experience. Since platform engineering serves internal users, our teams are also responsible for any support our users need. Balancing support for our services with the ability to maintain consistent delivery of new features/capabilities can be challenging to sustain.

To achieve this balance, we considered the following:

  • What would it take to create a support model that reduces the cognitive load on the users?
  • How could we plan for interruptions while maintaining the periods of deep focus required for new feature delivery?
  • How can we optimize our processes and identify improvement opportunities in our product to lead to a reduction in user support instances?

Solution

Collaboration across teams within Guidewire mostly happens on Slack. So our support model had to meet users where they already are and in ways they see value. To make it easy for our users, all platform products/services have a dedicated #<product>-users Slack channel where we provide support. To simplify the process for our users even further, each Slack channel utilizes a dedicated handle @<product>-interrupt to notify the developer pair on support duty that day. When the identified developer pair starts their work, they update the interrupt handle so they are the only ones notified and interrupted by every ping for help on that channel. This trade-off of support work allows each member of the team periods of deep focus time without interruptions while still providing continuous support to users. The only time @here is used in these #product-users channels is when the platform product team that owns and manages that channel wants to broadcast an important release or urgent action item. Limiting the use of @here makes it easier for users on those channels to filter important information from individual support requests. It also reduces interruptions for the team members working on new feature delivery that day.

To ensure interrupts can focus on solving user problems, interrupts are not expected to work on the sprint backlog. As a result, when planning capacity within platform engineering for a quarter, release, or sprint, we separate interrupt capacity from the total capacity to identify available capacity to work on the product backlog. Throughout the day, the top priority for the pair on support duty is to unblock teams who submit support requests in the dedicated Slack channel. Their remaining time is dedicated to improving product documentation, addressing the source of the defect that caused the need for support, or fixing technical debt. Their ultimate goal is to find ways to systematically reduce interrupt capacity.

We created a Slack Workflow to solve for this:

  1. As soon as the interrupt pair acknowledges the ask, a Jira ticket is automatically created in their project.
  2. The label: interrupt is added to that ticket to allow the team to identify and easily query the request for any analysis.
  3. The requestor in the Slack thread is added as the reporter so we can identify the driver of each request and work with them directly.
  4. Links to the Slack thread are added to the Jira ticket so we can access the Slack conversation from Jira at any point.
  5. The Jira ticket is assigned to the appropriate team and added to their backlog. With the use of labels, we can easily filter out these issues from the other product backlog items.
  6. The Jira Issue is assigned to the current sprint so it can be reviewed and discussed in the daily stand-ups and during sprint reviews to ensure we are closing the loops on each ticket.

Creating this process enabled us to present insights on platform engineering’s interrupt usage and have further conversations with our teams around their support activities. Insights include:

  • Identifying the specific component that is driving interrupt usage
  • Recognizing and identifying the product gaps that trigger the most interrupts. Things like:
    — unclear/incomplete documentation
    — developer experience
    — defects on existing features
    — missing features
  • Identifying teams that submit the largest number of support tickets. This allows our product managers to have targeted conversations with those teams to understand their pain points.
  • Metrics to show the time it takes to resolve issues, which inform SLA discussions.

Benefits

  • Traceability of every support request the team handles.
  • Improved customer experience by meeting them where they are and simplifying the process of accessing support.
  • Data collection on interrupt demand patterns.
  • Standardized process across teams, which streamlines reporting on metrics.
  • Backlog prioritization that is informed by user data.

Key takeaways

  • Keep the user experience in mind when designing support processes.
  • Allocate separate capacity for interruptions within the sprint/quarterly planning.
  • Prevent users from reaching out to individuals directly.
  • Reduce the number of interrupts over time by making data-informed product improvements using interrupt usage data.

We hope you enjoyed reading this post. If you have questions or feedback, please leave them as a comment. We are constantly experimenting and learning new things so be on the lookout as we will be sharing more such stories.

If you want to work on our Engineering teams building cutting-edge cloud technologies that make Guidewire the cloud leader in P&C insurance, please apply at https://careers.guidewire.com.

--

--

Guidewire Engineering Team
Guidewire Engineering Blog

Guidewire Engineers regularly write about how they are building a range of technologies to fuel P&C industry innovation.