Scaling Delivery Using Vertical Engineering Teams — Part 2: Process Challenges

Brendan Cone
5 min readSep 11, 2019

--

This article was originally published on the Flipp Engineering Medium Account back in January of this year.

This article is the second part of a 3-part series discussing Flipp’s ongoing journey to migrate our engineering organization from system-based teams to vertical (or “mission-based”) teams. In Part 1 of this article, I discussed some of the Human Challenges we have experienced during the ongoing migration. In Part 2, I will break down some of the process struggles that we needed to overcome, and how we ended up solving them.

Part 2: Process Challenges

The number of distinct channels of work ingested by the team increases greatly on a vertical team, and priority can sometimes be hard to judge.

When our team at Flipp was verticalized around coupons, we realized that we had eight different internal teams depending on our engineering team to satisfy their needs, including our operations team, technical partnerships, insights, account teams, and product development teams. We found ourselves in a situation where we were a one-stop-shop for anything from data pulls, to bug resolution, to operational support, to integration advice, on top of our project work.

Solution: Double-down on a disciplined development process.

First, and most importantly, the team needed to find a way to turn those eight channels into a single channel. We began standardizing the way that tickets were filed from each external party so that they would all go down the exact same path-to-resolution. This helped to avoid overwhelming the team with ad-hoc requests, and allowed for an easier way to prioritize all asks globally.

Tickets requiring visibility from our account teams were tagged with one label, operational service tickets another label, and investigations a third. We then built JIRA boards for the dependent teams to track the status of their tickets so they could see their movement through our workflow. Once the labeling process was in place, we worked to make sure that we had a quick and efficient triage process for more urgent items to help build trust with the requesting teams.

In our experience, working with the outside parties to determine the urgency of incoming work is one of the most important steps. Often, answering the question of “when can this be done?” and sticking to that commitment (even if it’s a few weeks in the future) is all that external team is looking for — they didn’t need us to solve the ticket right then and there. This helped all stakeholders involved to feel confident that their items were being prioritized while minimizing the weight on the team from the huge ticket volume.

As teams begin to work more and more in each others’ codebases, quality control can become difficult.

When our teams were oriented around systems, it was easier to ensure that our code was of high quality, since we had dedicated SEiTs within the same team who were specialized in testing that system. As we shifted to teams oriented around business objectives, the process became much more distributed: we were writing code in other teams’ systems, and they were writing code in ours, and we quickly needed to discover a new way to interact.

Solution: Static code analysis, high unit test coverage and continuous integration become prerequisites for success.

Here’s a real-world example where this came into play:

Our vertical team built a service that solves the problem of collaborating with third-party data entry vendors on simple tasks. This was important because many of these tasks were necessary to achieve our business objective. The service solves many problems inherent to the space, such as user login, concurrent access of tasks from multiple vendors, vendor roles and access control, and metrics and reporting on throughput.

Our team’s responsibility then became the implementation of tools which helped to maintain the quality of the service over time. We tended to perform code reviews on incoming tickets from other teams into our codebase, but we did not develop any feature code in the system if it did not relate to our business objective.

The only way we were able to ensure that incoming code fit in with our standards without stretching those developers on our team was through the extensive use of static code analysis tools and quality assurance automation. We also worked towards continuous integration to catch issues sooner when they did manage to slip through. If we did not have these tools in place, it would have been our team’s responsibility to run regression and smoke tests on every piece of code that came in, which would become a significant burden.

When nobody “owns” the system, who owns the system?

Imagine ten separate teams all modifying the same service to achieve their business objective. How does the team decide when their feature is production-ready if one of its components is an amorphous blob of changing code? How does each developer avoid the risk of breaking another team’s features which use the same component?

Furthermore, who coordinates deployment? Who solves code conflicts? Who ensures there is a vision for the future of the service? Who makes larger architectural decisions?

Solution: Designated teams become stewards of systems.

This may seem counterintuitive, because this whole article is about moving away from system ownership. Seemingly, our solution to one of the problems we experienced with vertical teams was to essentially designate an owner of the system.

However, I would like to highlight a few key differences between typical system owners and the concept of stewardship within our team:

  • A team’s stewardship of a system has no bearing on how often they will be writing code in that system, whereas owners typically receive requirements and perform the work themselves.
  • A steward will provide input and insight into the growth and direction of the system, but they do not have to be the source of all growth and direction decisions — ideas can come from anywhere, and the steward will help to assess whether or not those ideas are a good fit for the service or not.

And here are a few things that the steward is not:

  • Quality assurance for any business objectives that are not their own. It will be the responsibility of the SEiTs and quality assurance folks on each of their respective verticals to ensure that the code written in the steward’s service is performing its intended function correctly.
  • Merge conflict resolvers. When tickets are operated upon independently, sometimes merge conflicts (or adverse reactions between two separate code changes) exist and it will be up to the developers who have committed them to resolve them through collaboration.

Over time, our company has started to resemble a self-contained open source ecosystem, with stewards taking the role of repository owner, and everyone else in the company serving as contributors. It is the responsibility of that owner to reject code that does not meet the code quality standards of the service, but not to write that code themselves.

Check out Part 1 on Human Challenges if you missed it, or continue to Part 3 on Coding Challenges.

--

--

Brendan Cone

I’m a generally optimistic engineering manager who loves talking about engineering, management, music of all sorts, and a whole lot of other stuff.