Software Architecture: Making Decisions at Scale

Learnings from co-leading an architecture committee at scale

Raphaël Tahar
Decathlon Digital
Published in
10 min readJul 3, 2024

--

Decision opportunities can be chaotic

Through 4 posts, this series dissects the complex machinery behind architectural decision-making.

This inaugural post will address the structure of a choice opportunity, and how our organization effectively manages them on a large scale with the aid of a new kind of architecture committee.

📖 Series table of content

  1. 👉 Making Decisions
  2. Recognizing Scopes and Boundaries
  3. Architecture Decision Record & C4
  4. Social and Organizational Dynamics

One of my missions at Decathlon is to support over 120+ engineers across 23 feature teams in delivering value and co-influence global guidelines impacting 1500+ engineers. This involves tasks such as designing new systems, optimizing existing ones, capturing best practices, and overseeing the architectural decision-making process (spoiler alert: not for the decisions themselves).

To achieve this goal, we have established an architecture committee with four peers Staff+ Engineers: Alexandre Faria, Laurent Ducamp, Sebastien Nahelou, and Vincent Treve

Before bootstrapping the committee, we asked a few engineers to describe their decision-making process by drawing a flowchart.

Here is what came out 👇

Expectations — The Dreamt Flow 💭

The decision-making process in theory — The dreamt flow

It went like this:

  1. The “what” is identified and provided as input by either the product teams or themselves (the engineering teams), creating a choice opportunity.
  2. They evaluate the existing options to provide relevant and adapted solutions.
  3. The next step is analyzing the consequences of each option…
  4. … before selecting the most cost-effective and straightforward one for the team to reach its objectives.
  5. Finally, the decision is captured through a document.

This flowchart looks correct but doesn’t feel right. Even if we’d like it to be, this is a poor representation of what actually happens.

Let’s find out why, thanks to the Garbage Can Model.

Garbage Can Model 🗑️

This model, created in the 1970s, aims to depict the chaos generated by decision opportunities. As always, the famous aphorism applies: "All models are wrong but some are useful," and this one is particularly so.

It starts with the following statement:

The garbage can model describes the chaotic reality of organizational decision-making in an organized anarchy.

[…] the garbage can model symbolizes the choice-opportunity/decision-situation as a “garbage can” that participants are chaotically dumping problems and solutions into, as they are being generated.

- Wikipedia

After reading this, you might already be trying to assess how much your organization is an organized anarchy.

Let me end the suspense, all organizations are. Some more than others but ultimately every company is composed of many employees who spend their time trying to fix problems and competing with each other to gain attention.

Problems-Solutions-Stakeholders streams 🚰

Some propose problems, and others propose solutions. They are usually loosely coupled and evolve as the discussion progresses. Indeed, problems are refined, and solutions come and go during that process.

Besides, every participant doesn’t have the chance to fully participate in the discussion (which might take more or less time). Because their priorities, available time, resources, and energy are inequal and oscillate over time.

The model’s core concept is to think of problems, solutions, and stakeholders as evolving material answering different agendas and objectives.

The “garbage can” term’s significance is best understood by considering the manner in which items in a trash can are organized, which is a messy, chaotic mix. The model portrays problems, solutions, and participants/decision-makers as three independentstreams” that are each generated separately, and flow disconnected from each other.

Let’s illustrate this through a pair of diagrams.

Every decision opportunity is a mix of problems, solutions, and decision-makers. The garbage can represents the choice opportunity in which they are thrown.

The decision-making process in practice — The Garbage Can flow

Give it some time, and a decision might eventually come out of the process. But before it does, stakeholders must find their way into that chaos and simultaneously:

  • navigate the complexity of intertwined problems
    (Should the problem be split or not? Are they linked to other issues?)
  • connect solutions and problems
  • identify and align stakeholders’ objectives
  • and spot the most optimal solution(s)
The intrinsic chaos of a choice opportunity

Decisions are hidden at the crossroads of different streams whose flows fluctuate over time. Once merged, they form a unified and cohesive unit combining all their characteristics (aka the decision). A fluctuation in one of the upper streams may impact the downstream decision.

Changing a single variable can cause cascading effects or have no impact at all. Being able to distinguish which parameter matters and which doesn’t is at the heart of efficient decision-making.

Connecting reality with expectations 🔗

After reading about the Garbage Can principles, you might be reminded of complex past experiences where the five steps illustrated by the “dreamt flow” fell short.

Let’s find a way to meet halfway and address the complex decision-making process while still delivering what most people naturally expect from theory.

Those who like to get general ideas by focusing on the details first will have noticed that the “dreamt flow” described in the first part of this post and the “Garbage Can flow” do not share the exact same steps.

The Garbage Can decision-making missing items

Indeed, the “garbage can flow” is missing 3 items:

  • Decision alternatives
  • Consequences
  • Consequences vs Objectives

Organizations must ensure that those 3 points are covered, if they aren’t, the chances for late projects, late no-gos, and backfires (like unreliable services causing penalties) will significantly increase resulting in huge budget wastes.

Therefore, my peers Staff Engineers, and I implemented an architecture committee within our Domain: Customer Growth.

The Architecture Committee

Its mission is NOT to make architectural decisions for Software Engineers but rather to assist them to:

  • take the time to define and refine their problem and context properly
  • involve the right stakeholders
  • spot solutions and search for the most relevant alternatives
  • know the consequences of each solution
  • compare solutions’ consequences with the project’s objectives
  • ensure the alignment with Decathlon’s technical strategy and guidelines
  • capture all of the above in Architecture Decision Records and Diagram as Code through C4 representations

The surface for error, conflict, and inaccuracies is vast since comparing and weighing items of different natures is difficult. Indeed problems, solutions, and decision-makers aren’t easy to balance.

Consider the committee an enabling team that fosters the right mindset and influences the engineering culture rather than making the decisions. When accompanying a team, the committee’s focal point is to prevent teams from making two kinds of mistakes:

  1. Confuse problems for solutions and vice versa or get lost in ill-defined problems (more in the next post).
  2. Mitigate the “roles and responsibilities confusion” of participants trying to make sense of their role in the organization (more in the last post).

The next posts of this series will elaborate in greater detail on the committee’s attention points.

However, the “ensuring alignment” responsibility deserves further explanation as it’ll help to better frame decision-making at scale. So, if you’re interested in making decisions in large organizations, keep reading 👇 otherwise, feel free to jump onto the next post of this series (Recognizing Scopes and Boundaries).

So, back to the Committee.

It is responsible for creating coherence at the Domain level by assisting teams in making decisions, but it also plays a role in the global tech strategy.

Let’s find out how by breaking the organization down and pointing out its choices arenas (🏟)️.

Domain Architecture Committee: a two-way interface

The diagram above highlights the Decathlon Digital organization structure, divided into several business units called Domains. Each Domain is responsible for a specific business capability such as E-Commerce, Customer Growth, Value Chain, Platform Engineering, etc.

Domains are also responsible for influencing and building cross-domain guidelines (Tech Standards, Tech Radar, and Maturity Matrices) providing golden paths and limiting engineering chaos.

To sustain this dual responsibility, we employ the Domain Architecture Committee to both enforce the general guidelines at a local level (1️⃣ Enforces) and influence them by bubbling up new local use cases to higher authorities (2️⃣ Participates Influences).

The first cross-domain authority is the Special Interest Groups (SIGs). SIGs are cross-domain communities of practice composed of Staff+ Engineers and Engineering Managers (the very same involved in Domain Architecture Committees) that build or sponsor standards. For a standard proposal to be adopted, it must be validated by the majority of its members.

So, to avoid the ivory tower syndrome, Domain Architecture Committees’ leaders participate in SIGs to ensure that any relevant emerging local use cases are hoisted in higher decision arenas (SIGs or ToC), potentially leading to global Tech Standards, Radar, or Maturity Matrices updates.

Last but not least, proposals with a high strategic impact must be approved by a final authority called the Technical Oversight Committee (ToC) composed of CTOs and senior technical leadership to ensure company-wide technical coherence and business viability.

To sum up, decisions are dispatched on three different decision arenas 🏟️. The subsidiarity principle applies.

  • Any decision not affecting parallel teams or domains can and should be made at the team level (as long as they follow global guidelines & standards). The Domain Architecture Committee plays an advising role here.
  • The committee will provide strong recommendations for decisions involving a cross-team scope and may escalate to the next authority if the impact exceeds the domain’s scope.
  • The committee will forward decision opportunities with a medium or high impact to the appropriate SIG, which will then decide whether to update or create guidelines.
  • Finally, decisions with a higher impact require approval from the ToC instance.

This flow is not advisable for any organization as it creates delays. However, it ensures that the right stakeholders are included, which, in turn, allows problems and solutions to be thoroughly addressed.

Establishing global standards has a significant impact on engineering culture and facilitates decision-making in closer proximity to the issues at hand. Besides, it’s worth noting that escalation rarely occurs and only concerns issues having a high impact not already addressed by existing standards.

Having said that, precise impact-level definitions significantly reduce bottlenecks by shaping the escalation processes. Each organization is unique, so these definitions must be tailored to fit their specific contexts.

The Domain Committee’s KPIs

To assess the committee’s performance, we must regularly monitor several facets.

The first is the Engineers’ satisfaction through a Net Promoter Score survey (NPS). Teams should be more confident and notice that their:

  • experts can share their knowledge and spot any blind spots
  • engineers from any seniority can learn from the decisions made

The second is the impact on SLOs (Service Level Objectives). Monitor closely the metrics your organization has elected as the most significant priorities. For instance, if reliability is the Quarter priority because of Error Budgets fully consumed, keep an eye on its correlated KPIs.

Note that initiatives such as a committee can observe its effects on an organization with a delay that depends on cross-functional roadmaps and priorities, generally counted in months or quarters.

A last set of KPIs worth mentioning is the DORA metrics.

This acronym stands for DevOps Research and Assessment. It aims to evaluate a team’s capacity to deploy frequently, have quick Lead Times (the amount of time between a commit and its release in production), have a low rate of failure on production releases (Change Failure Rate), and have a short Time To Resolve when a failure does occur in production.

If the Architecture Committee fulfills the abovementioned responsibilities, you should quickly see an impact on these metrics.

Another committee’s interesting side effect lies in the outputs it creates (C4 and Architecture Decision Records; more on those in the third post). These initiate a shift toward a written culture, which is key in scaling any organization.

Takeaways 🌯

To synthesize, bear in mind that any organization is a collection of:

  • choices seeking problems
  • issues and feelings seeking decision situations
  • solutions seeking issues
  • and decision-makers seeking out work

Some will try to propose their solution to every problem, others will try to justify their paycheck or title by pushing solutions or problems, and some will adapt their solutions based on the problems they face.

Ensure your actions and standpoints push the decision opportunity toward the relevant problems, solutions, and stakeholders combo. Discard any irrelevant proposals, whether from others or yourself.

Architecture committees are a great way to achieve this if implemented with the right cultural shift mindset. It should be responsible for the decision-making process, the technical coherence of a Domain, and influencing global standards, while teams should be responsible for the decisions impacting their products.

In this series’ next post, we’ll describe the perception shift needed to apprehend different problem archetypes better and dive into the solution-problem retro-action.

Thanks for reading! 🙏🏼
👏🏻👏🏻👏🏻 Give a few claps and “
follow” if you enjoyed this series.

💌 Follow Decathlon’s latest posts on Twitter and LinkedIn and discover our latest stories on Medium 🚀

Acknowledgments
And a big thank you to Sebastien Nahelou, Laurent Ducamp Partner, Vincent Treve, Damien Raude-Morvan, and Matthias Castelain for their thorough reviews and feedback. 💪

--

--

Decathlon Digital
Decathlon Digital

Published in Decathlon Digital

From securing teammates smartphones to building new ways to empower athletes with connected sport products, we engineer software, data, security and robotics solutions to empower our 105 000+ teammates and 400 millions users worldwide

Raphaël Tahar
Raphaël Tahar

Written by Raphaël Tahar

Sr. Staff Engineer, Sociotechnical Architect, Author, and Philosophy Ënthusiast. Proud dog father 🐶. Opinions are my own.

Responses (1)