Scaling Scrum : When multiple teams are working on one single product.
A lot has been written about scaling agile, mainly using SAFe, LeSS, Nexus (or any other "scaled" framework). But we don't have the same amount of literature in terms of cases and real implementation experience. So here are my two cents!
My experience at Modulo was great in this matter. By the time I joined the company leading the engineering group, we had multiple Scrum teams (covering both the web and the mobile versions of the product). Every team had one SM (at the time, the role was performed in some kind of job rotation, fulfilled by one member of the team — switched sprint-by-sprint), and one PO. Since we were short on POs (having only 5), most of the POs worked in 2 (or even 3) Scrum teams.
These teams accounted for approximately 60 people (counting testing/QA, front-end, back-end, UX, designers, interns and dev-ops like personnel). Each team was formed by 4 to 8 people. Each functional group used to have one coordinator (mobile, support/dev-ops, integration, architecture, design, QA, and so on). The PO's were part of the Product team (and not from the engineering group).
We had 2 main products, but under the hood they were built 70% on the same code base. So, internally we didn't differentiate much among them. The core of the product was (it is still) very coupled, mirroring the rules and concepts for an almost "full-fledge" IT-GRC product (Risk, Compliance, Governance, Integrated Workflow, Policies, Knowledge, Business Continuity, etc.). No microservices. Everything was built using the old monolithic approach that was common back in 2008, 2009 (when the core of the product was built).
As usual, the product had (it still has) some technical debts, a complex architecture, a lot of bugs, a very large code base (approx 3M loc), and no automation, which in turn made the bug fixing process difficult and time-consuming (to say the least).
Some of my learnings
Individuals and interactions over processes and tools (Agile Manifesto)
It's all about people. Even scaling is (also) about people. So if you don't have the right skills in the right places and roles, or if you are short of them, things won't work as they should.
People are the key part of any agile implementation. They are key, even in understanding why and when to break the rules and to make exceptions. If you have a very coupled codebase, it's not easy to isolate the modules in terms of function. So, how we dealt with it?
We did something "not recommended" for a Scrum team. We assembled 1 large team (12 people) to handle the core of the product. This group, although using Scrum, had a very good "project manager"-type as a dedicated Scrum Master. So, besides facilitating the team, helping in the conversation with the POs, he added a lot in terms of project management.
For this large team we had people with previous experience in every single module the team was responsible for. So, the team was in some sense self-sustained. We also had one of the best testers and a very good architect in the team.
Another change: we changed the focus of the Dev Support Team for a DevOps approach, mainly focusing on fixing some of the problems we had with our dev infrastructure.
Learning points: If you don't have the right people, (you can hire them or) you have to adapt and work with the people you do have, providing coaching and any early guidance necessary to the missing knowledge areas. You also need to understand what are the trade-offs you are working with, so you can help the team to manage them properly. In our case, for example, I was sure the communication would be an issue, but I was confident the SM would be able to handle it well (as he did).
Each team must have one (and only one) backlog.
At first, each team used to have at least 3 different backlogs (sometimes 4). The PO managed only new stories, the support group managed the customer tickets backlog, the QA coordinator — and myself — demanded the team to fix all high priority bugs. The chief architect requested compliance/adherence to the product technical architecture and demanded the teams to pay the tech debts.
So, let's go one step at a time. You might be thinking about it: the PO is not supposed to manage the bug fixing. You are right. Well, at least partly right. If the PO seeks for excellence, paying the debts and letting the team work, you are right. But… What if the POs pushes the team hard only monitoring the velocity? What happens to the product when you are demanded to ship it right away (anyway), not paying any tech debt? Well, the product as a whole suffers. The customer suffers. So, the PO had her share in the issue for sure. Remember that the PO is ultimately responsible for the product. If it has bad quality, well, the PO is accountable as well.
How to change this chaos?
Continuous attention to technical excellence and good design enhances agility. (Agile Manifesto)
One of the first things we did was to have one engineer per team dedicated/allocated to customer tickets. This engineer could be rotated after each sprint, but every team had to have at least one dedicated engineer to fix customer tickets. This simple change was very powerful. It simplified a lot of issues and conflicts, and terminated the constant struggle between the product and support groups on what developers should do.
What about the bugs and the stories (still 2 backlogs)? Well, in this case I managed to reach an agreement with the senior management to put in place a very restrict bug-fixing policy. All critical bugs should be prioritised over the stories. Period. So, if the team had 1 critical bug, someone on the team should halt what was being done (any story being developed) to fix the bug. At first, this meant a lot of failing sprints, since there were critical bugs. (It was just like the "stop the line policy" at the Toyota Production System; and as happened there, in the beginning, the line was stopped very, very often). I know it seems weird, and I do recognise all the problems with this approach. But we started to change the culture of the PO (since they were being directly affected) and of the team as well.
Learning points: Each team must manage one (and only one) prioritised backlog. And make sure you have the right policies in place, so everyone knows the priorities at any single time, in a very explicit way.
The mindset beats the process. Every single time.
The agile mindset is one of the most important aspects of the agile movement. Delivering often is important, communication and collaboration are important as well. Besides, when you are building a complex and large product, slicing and prioritising the user stories is crucial. Understanding what is "value" is key. Learning and improving are two other keystones of the agile movement.
In our case, sometimes we had to slice the story and assign each one to different teams (since some of the stories spanned through more than one team). And we also needed to take into account the complexity of the codebase. Sometimes the PO asked for a 2-sprints story, refusing to slice it more to fit in 1 sprint, since “it would deliver no value unless delivered completely”… So, it was a constant struggle with the POs to properly slice the story so it could be delivered within the sprint and with quality. (And it only changed when we changed the Product Manager, one with the proper mindset.)
Learning points: The agile mindset is critical for having multiple teams coordinating the development of one single product. Much more than it is when you are working "stand-alone" in one team. With the right mindset, your team can learn and grow, can identify and fix its problems and issues, can learn to work as a team. When in doubt, go back to the manifesto and check out its fundamentals.
People have to work together. Real collaboration is a must.
When you have multiple teams building one cohesive product, you have to have coordination meetings in place to get the inter-team coordination flowing.
In our case, we used a "Scrum of Scrum"-like model, so every single day we used to run a meeting with all Scrum Masters, one support representative, some functional coordinators and myself, to walk through dependencies and inter-team issues. This was also a place where people could collaborate on other issues, like potential impacts of changes/refactoring, any problems the team were facing that the other teams might be/would be facing face as well.
The main point here was to have a place and time settled so the teams could talk to each other, building bridges to any future discussion.
Another thing we used to do was having one single sprint review meeting for all teams. At the end of the sprint, all teams reviewed their deliverables, demoing their products increments. At these reviews, all engineering group participate, as well as the product group, the support group and any other group/people that would like to. It was an open meeting, with an open discussion.
Learning points: When scaling, you need to have real collaboration. People must put in place specific meetings to discuss issues, share experiences and communicate whatever they want.
Someone has to be looking the overall product.
When you have one product per team, or just stand-alone products and teams, the PO — most of the time — has the autonomy and the power to decide on her own. But if you have multiple teams with multiple POs working on a single product, someone has to be the "master" PO. This individual needs to look all teams in a more holistic way, understanding the "puzzle" as a whole. The "master" PO should have the "global" product roadmap in mind, guaranteeing that each team backlog doesn't collide with one another and ultimately converging to the desired goal.
And this not only matter regarding features in the backlog. Sometimes you can have different teams implementing similar behaviours in different and discrepant ways. Just to find out you have 4 different ways to upload a file, or your panels and grids works in 3 different ways along the product. In our case, this role was played by the Product Manager and co-played by the Product Director.
Learning point: Have someone overseeing the whole product, so you can have a unique/cohesive product.
Having a single cadence makes things easier.
When you have multiple teams, the first thing you think of is to have all teams working somehow independently, so they can manage themselves loosely. But while it’s easily done when working on isolated products, it’s not that easy to manage when you have multiple teams contributing to a single product.
In our case, it would be a nightmare to let the teams decide on their delivery cadence. So we fixed the sprints lengths (2 weeks) and for some more mature teams, they were allowed to have shorter cycles, but still participating on the common sprint review.
I tried to push the teams for a shorter sprint cycle of 1 week, since the shorter the sprint, the easier it is to manage team interdependencies, but both the teams and the POs were constantly rejecting the idea, wrongly arguing it would make their work harder and with much more meetings (twice as much).
For several reasons we used to use the concept of release. So after 6–8 sprints we released the product to the market. All stories developed by all teams were properly integrated and extensively tested for the product release. At this point, we were able to check for major and minor issues and inconsistencies.
Learning point: Unless you have a very mature group, with no (or little) coupling, it is better to attain to a single cadence for governing all teams. The shorter the cadence, the easier it is to manage the whole process.
In this context of multiple teams and one product, you have to add some ceremonies, revive some you have just forgot, or just scaled the ones you've already in place. I already discussed the daily "Scrum of Scrum"-like meeting. But this is not the only ceremony we used in our process. So, what were the other ceremonies?
:: Release Roadmap and Objectives: Just as you have the sprint backlog and the sprint objective, now we had a release roadmap and release objective covering ALL teams. It was analogous to the release roadmap for a single team, but the main difference was that in a multiple-team-one-product environment, this roadmap (covering all teams) covers a lot of interconnections and interdependencies among and between the teams.
:: Release Planning: At the beginning of each release we set apart some time for a release planning. The idea here was just to level everyone's understanding regarding the release objectives, trying to assess how risky and bold were the release goals. In this meeting, the teams performed some what-if scenarios in preparation for the planning. And also as a planning-only exercise the teams prepared a high-level backlog and a high-level technical planning. This backlog was presented to the other teams for discussion. This was a great and useful ceremony.
:: Release Review: At the end of the release, before the release planning, we performed a release review. This was a not an ordinary review, since we also reviewed the teams/group metrics, the achievement of the goals and objectives and any other high-level issue that needed to be reviewed.
::: Release Retrospective: At the end of the review and before the planning we had a "2-tier" retrospective meeting, first on team level, and then covering the whole group.
The 3 ceremonies above (Release Planning/Review/Retrospective) were conducted on an off-site 2-day meeting.
:: Customer Tickets Follow-Up: A bi-weekly meeting, with all engineers dedicated to customer tickets and their respective support staff. This allowed us to properly and timely manage the response to our customer tickets and customer related issues.
Learning point: When you are scaling Scrum, you also have to scale the ceremonies. If you are using Nexus, LeSS or SAFe, the framework will take care of those for you. If you are not, you need to pay attention to them yourself.
Cross function coordination is needed.
When you have one single product, the UI, the user flow, some common component behaviours and some set of features need to be standardised. So you need to have some kind of cross-functional coordination. In our case, we had some specific coordinators looking for different aspects of the product (QA for functional standards, design for UX and look & feel standards, mobile for our apps, and so on).
On a weekly basis, all coordinators seated together to track and manage their respective areas of concern and to do follow up with their peers.
Learning point: Cross-function coordination increases the chances of success. If you don't have it, you might be putting the overall quality of your product at risk and you might be wasting a lot of energy.
A high level goal setting framework helps everyone.
Since I joined the company, we started to use the OKR framework together with Scrum at the engineering group, initially with the coordinators. After 3 iterations, both the Objectives and the Key Results were already converging and helping the teams a lot. In our case, the most important aspect was the overall alignment. The next step would be to extend the adoption to the teams and to the product group as well. (For details here please check this presentation — in Portuguese only.)
Learning point: If you don’t have a clear and well-known goal, it’s pretty easy to get lost on your way. OKR is simple, lightweight and straight forward, but you can use any goal-setting framework you choose. Foursquare had a very good approach (they used to roadmap management, but you can use it for anything), called #Now, #Next, #Later.
Metrics are a must.
"You cannot manage what you do not measure" (Deming)
So you have to have your key metrics in place. Actually, every team must measure its owns KPIs (the ones agreed on a group level, the ones the teams believe make sense for them, and the ones agreed upon with the PO). In our case, we were short on this subject. We had some basic metrics regarding the support indicators, and the QA coordinator had some metrics regarding bugs. The teams? They had none… So we started to collect, consolidate and analyse a lot of KPIs. And the KPIs were really used to manage the teams (both at the team level and in the engineering group as a whole). If something could not be understood by the existing metrics, new ones were added (permanently or ad-hoc). But they were process (engineering) metrics.
Learning point: If you don’t have metrics in place, you are flying blind. You have to build you metrics and KPIs, identifying the ones that make sense in your particular case. And then you have to use them. Really use them.
Scaling agile from one single team to multiple independent teams is not that difficult, since most of the time you can manage the teams in isolation with no interdependencies with each other.
Scaling agile in a multiple-teams/single-product arrangement may place additional challenges in the organisation. But adding new and simple procedures, paying close attention to agile principles and establishing a common goal-setting framework (like OKR, for example) turns the challenges into a fun thing to work on.
Remember, you can use SAFe, LeSS, Nexus; but be prepared to change, to try new approaches until you find the one that is right for you.