Scaling Delivery Using Vertical Engineering Teams — Part 2: Process Challenges
By: Brendan Cone, Director of Engineering, Flipp
In Part 1 of this article, I discussed the Human Challenges of migrating an engineering organization from system-based teams to vertical teams. In Part 2, I will break down some of the process struggles that we encountered during the transition that we needed to overcome and our solutions to them.
Part 2: Process Challenges
1. The number of distinct channels of work ingested by the team increases greatly on a vertical team, and priority can sometimes be hard to judge.
When our team at Flipp was verticalized around coupons, we realized that we had eight different teams depending on our engineering team to satisfy their needs, including our operations team, technical partnerships, insights, account teams, and product development teams. We found ourselves in a situation where we were a one-stop-shop for anything from data pulls, to bug resolution, to operational support, on top of our project work.
Solution: Double-down on a disciplined development process.
First, and most importantly, the team needed to find a way to turn those eight channels into a single channel. We began standardizing the way that tickets were filed from each external party so that they would all go down the exact same path-to-resolution. This helped to avoid overwhelming the team with ad-hoc requests.
Tickets requiring visibility from our account teams were tagged with one label, operational service tickets another label, investigations a third. We then built JIRA boards for those teams to track the status of their tickets so they could see their movement through the workflow. Once the labeling process was in place, we worked to make sure that we had a quick and efficient triage process for more urgent items to help build trust with the requesting teams.
In our experience, working with the outside parties to determine the urgency of incoming work is one of the most important steps. Often, answering the question of “when can this be done?” and sticking to that commitment (even if it’s a few weeks in the future) is all that external team is looking for — they didn’t need us to solve the ticket right then and there. This helped all stakeholders involved to feel confident that their items were being prioritized while minimizing the overwhelming effect of the volume of tickets.
2. As teams begin to work more and more in each others’ codebases, quality control can become difficult.
When our teams were oriented around systems, it was easier to ensure that our team’s code was of high quality, since our dedicated SEiTs were within the same team. As we shifted to teams oriented around business objectives, the process became much more distributed: we were writing code in other teams’ systems, and they were writing code in ours, and we needed to discover a new way to interact.
Solution: Static code analysis, high unit test coverage and continuous integration become prerequisites for success.
Here’s a real-life example where this came into play:
Our vertical team built a service that solves the problem of collaborating with third-party data entry vendors on simple tasks. This was important because many of these tasks were necessary for our business objective. The service solves many problems inherent to the space, such as user login, concurrent access of tasks from multiple vendors, vendor roles and access control, and metrics and reporting on throughput.
Our team’s responsibility then became the implementation of tools which helped to maintain the quality of the service over time. We tended to perform code reviews on incoming tickets from other teams into our codebase, but we did not develop any feature code in the system if it did not relate to our business objective.
The only way we were able to ensure that incoming code fit in with our standards without stretching those developers on our team was through the extensive use of static code analysis tools and quality assurance automation. We also worked towards continuous integration to catch issues sooner when they did manage to slip through. If we did not have these tools in place, it would have been our team’s responsibility to run regression and smoke tests on every piece of code that came in, which would become a significant burden.
3. When nobody “owns” the system, who owns the system?
Imagine ten separate teams all modifying the same service to achieve their business objective. How does the team decide when their feature is ready if one of its components is an amorphous blob of changing code? How does each developer avoid the risk of breaking another team’s features which use the same component?
Furthermore, who coordinates deployment? Who solves code conflicts? Who ensures there is a vision for the future of the service? Who makes larger architectural decisions?
Solution: Designated teams become stewards of systems.
This may seem counterintuitive because this whole article is about moving away from system ownership. Seemingly, our solution to one of the problems we experienced with vertical teams was to essentially designate an owner of the system.
However, I would like to highlight a few key differences between typical system owners and how we see stewards here at Flipp:
- A team’s stewardship of a system has no bearing on how often they will be writing code in that system, whereas owners typically receive requirements and perform the work themselves.
- A steward will provide input and insight into the growth and direction of the system, but they do not have to be the source of all growth and direction decisions — ideas can come from anywhere, and the steward will help to assess whether or not those ideas are a good fit for the service or not.
And here are a few things that the steward is not:
- Quality assurance for any business objectives that are not their own. It will be the responsibility of the SEiTs and quality assurance folks on each of their respective verticals to ensure that the code written in the steward’s service is performing its intended function correctly.
- Merge conflict resolvers. When tickets are operated upon independently, sometimes merge conflicts (or adverse reactions between two separate code changes) exist and it will be up to the developers who have committed them to resolve them through collaboration.
Over time, our company has started to resemble a self-contained open source ecosystem, with stewards taking the role of repository owner, and everyone else in the company serving as contributors. It is the responsibility of that owner to reject code that does not meet the quality standards of the service, but not to write that code themselves.
Stay Tuned for Part 3 — Coding Challenges!