Organizing Work for Simplicity and Improved Collaboration

Guidewire Engineering Team

Published in

Guidewire Engineering Blog

7 min readMay 13, 2024

Tales from Platform Engineering Program Management

By: Umang Jain (Director, Program Management) and Yoganand Ghati (Senior Program Manager, Engineering)

Team brainstorming around a white board.

Moving to a cloud product is advantageous because it allows for the continuous release of innovation and improved security. But even for organizations that have been immensely successful in developing on-premise (aka on-prem) products, the transition to developing cloud products is a difficult undertaking. The challenges of this shift are many. In addition to technical changes, comprehensive change management is also required. This often includes navigating an organizational structure built and solidified over time to optimize value in an on-prem business model. It requires breaking down the organizational silos that were created to support efficiency gains in an on-prem world. Transitioning to the cloud also requires a shift in mindset, organizational structure, and team behaviors. Various business units also need to come together in new ways to plan and operate as a single unit. Every release/change introduces the potential to disrupt business and therefore needs to be orchestrated and well coordinated. But despite the best intentions, incidents do happen.

Successfully, reliably, and securely running and operating a cloud infrastructure requires strong operational control at each step of the development process.

Program management plays a critical role in reducing these incidents by enabling teams (both development and operations) to run predictably and improve processes with every learning moment. As an example, every release needs to ensure that the Definition of Done is met to deliver consistency and predictability for a variety of internal stakeholders. The ideal state of operations is when internal systems are automated to the extent that process control needs no manual supervision. But getting to that state is a journey that entails balancing manual process controls while simultaneously investing in automation that can detect variations and guide teams toward corrective actions. The Definition of Done constantly evolves as teams learn from previous incidents and commit to strengthening the process.

This blog is about the on-prem to cloud journey that Guidewire is making and how program management in our cloud platform engineering team has been instrumental in enabling teams and stakeholders to deliver consistently and predictably. We do not intend to claim that “we have figured it all out” or “this is the way to make the journey and therefore every organization should subscribe to it.” Instead, we intend to share some of the problems we’ve faced and how we’ve solved them so that other program managers facing similar challenges don’t have to start from scratch. Additionally, we are approaching this blog from the perspective of wanting to learn from others as well. While reading this, if you think of alternative suggestions we could explore, we would love to hear about them in your comments. We are a team of adaptive individuals who invest in experimenting with new approaches and are open to new ideas.

If you find yourself curious about Guidewire and our cloud platform, check out additional blog posts here: Guidewire Engineering Blog — Medium

Program Manager Role

First, let’s take a moment to define the role, responsibilities, and expectations of a program manager in Guidewire’s cloud platform engineering team.

Broadly speaking, our program managers are responsible for:

Planning and delivery of larger efforts: Program managers work with the engineering and product management leaders to influence and make appropriate prioritization decisions, especially when features and projects span multiple teams.
Engineering Operations: Program managers identify tools and create processes to create a high-performing platform engineering organization that efficiently solves important problems.

We engage with our teams to deliver the projects across our portfolio and also work to break down barriers. We facilitate biweekly meetings with our team leads to review the challenges and bottlenecks our teams face, discuss options, and drive solutions. Solutions could include creating documentation, rolling out process changes across teams, or coordinating with internal stakeholders to remove barriers.

Problem

We needed to organize work for simplicity and improve collaboration.

Context: Engineering teams at Guidewire use Jira to track, manage, and organize their work. Teams are organized around the product or component (in cases where a component itself is a huge and a significant piece of our architecture) with each team owning the product/component’s lifecycle. To minimize the isolation of work on each product/component, each team owns its Jira project. All the product/component backlogs (bugs, features, and internal requests) are tracked within that Jira project. There are huge benefits to the individual teams and the organization as a whole in using this approach.

If your team is the owner, it is easy to query issues for your product/component with minimal search variables.
If your team is dependent on a different team, it’s easy to know the Jira project in which to file the “defect” or the “feature request” so that the team that owns the project can take action.
Each team can organize their release (or fix version/s), components, and workflows according to their needs within their own Jira project.

As we started to grow and organize our platform engineering teams around our platform components like authentication, networking, persistence, build tooling, environment properties, secrets management, etc., it was only natural for us to have a Jira project created for each component team. This gave each team the ability to create and manage their own Jira Components, Fix Versions, etc., to oversee and organize their work.

Despite these benefits, we collectively struggled for three main reasons.

First, when one of the application teams or support teams at Guidewire needed to report a platform issue, they didn’t always know which component caused it or which Jira project to file it under.

Second, instead of “moving” the issue to a different Jira project, incorrectly filed issues were often closed as ”not a problem” for that component rather than being rerouted to the correct component. This would cause frustrations for the requesters who would then have to file the issues again in a different Jira project and hope they got it right the second time.

Third, since our platform increment is a combination of multiple components and their increments, we didn’t always know how to leverage Jira Query Language (JQL) to determine the payload for a given platform release in a repeatable way. We tried querying for issues where fix version in component-1 vX, component-2 vY,……, component-N vZ) but that requires:

All projects to close issues with the same fix version. But for this to work, we had to ensure that this assumption was always valid. Our option was to have rules in the workflow of each project to ensure that an issue cannot be marked complete until a fix version is added. But that also required updating the project each time new needs were identified.
Capturing the fix version in a specific location for each component for the given increment so we could use it to cross-reference.
Updating the query in the template to deliver the platform increment each time a new component was added.

Solution

Our solution to this problem was to bring all platform components together and manage them under a single Jira project. As we made this decision, we traded the isolation benefit of having a dedicated project per component for the ease of collaboration with the rest of the organization. Here is how our platform is managed in Jira today.

We started by ensuring all platform engineering work is managed under a single Jira project going forward.

We also decided that Guidewire platform components would be managed as components of the single Jira project.

Each component has a component lead identified in Jira.
All issues filed in our Jira project must have a component identified at the time of issue creation, which is mandated by the issue creation workflow.
All issues created in our project are assignable to the component lead based on the component selected at the time of issue creation.
Components are also mapped to our teams. A team (identified via a field called “assigned pod”) could manage one or more components in our Jira project.

Individual component teams now create their component fix version, which is unique to their component increment. But we also have one platform release fix version which all teams must use on their Jira issues.

A fix version is also required when an issue is closed. This is to ensure completeness at the point in time when the engineers have the most reliable information about the release.
Every two weeks we review closed issues within the project that do not have the platform release fix version attached. This is now easy to find and fix. Once done, this becomes the payload for the release.

Measures of Success

Significantly reduced ambiguity around where platform engineering work happens and how and where to engage the team.
Significantly increased internal user satisfaction by reducing the number of tickets marked as “not a problem” because they were filed against the wrong component (and not because there was nothing to fix).
Achieved increased collaboration across platform engineering teams by removing the obstacle that required the issue to be assigned to a different team. Now, they simply change the ”assigned pod” field on the Jira issue and the component and the ticket moves into the appropriate team’s queue.
Improved consistency of processes and results across teams.
Reduced complexity of querying status and payload information for a given platform release.
Minimized administrative overhead when team members rotate across teams. (Previously team members had to be manually added to other Jira projects to get required access to assign stories to them).
Reduced time to onboard a new team within platform engineering.

Key Takeaway

Program managers need to establish a forum that encourages frequent conversations with the broader stakeholders to uncover problems the stakeholders are facing. Stakeholders don’t always think of their experiences as problems to be fixed. But when a Program manager hears the same concerns repeated multiple times by different stakeholders, they are better positioned to recognize patterns and find solutions.

We hope you enjoyed reading this post. If you have questions or feedback, please leave us a comment.

We are constantly experimenting and learning new things and will share more such stories soon.

If you want to work on our Engineering teams building cutting-edge cloud technologies that make Guidewire the cloud leader in P&C insurance, please apply at https://careers.guidewire.com.