Platform as code: Reference architectures to simplify developer platforms

McKinsey Digital
McKinsey Digital Insights
9 min readMar 20, 2024

Modern developer platforms can redefine software development and enable DevOps at scale. Reference architectures provide a starting blueprint for their design.

By Stephan Schneider — Expert Associate Partner, Navin Agarwal — Principal Cloud Engineer, Thomas Delaet — Partner, and Remy Paternoster — Partner, McKinsey Digital

How can tech leaders unleash the full potential of their software development talent? By empowering their developers, creating an environment ripe for innovation, and improving their productivity with the right tools and processes. These factors contribute to what we call “developer velocity” — a key driver of business success. Previous McKinsey research has found that companies with top-quartile developer velocity outperform bottom-quartile companies by four to five times (measured in revenue CAGR).

Reducing the cognitive and operational load on developers is a critical step toward increasing product innovation and velocity. The idea of “you build it, you run it” has empowered software development teams to take ownership over their products and release them more frequently and with higher quality. However, it has also placed tremendous strain on software developers. They need to stay current on the latest software-engineering practices and tooling, maintain in-depth knowledge of running and integrating a large breadth of cloud service offerings, and deliver on the ever-increasing velocity of product output that many organizations expect.

A modern developer platform can go a long way toward boosting developer velocity and elevating the developer experience. By acting as a self-serve “one-stop shop” for developers, such a platform reduces manual work, increases process efficiency, and propels standardization, security, and compliance by design. In other words, it allows developers to focus on what they do best — writing code to solve a customer problem.

While every platform is different, certain best practices have emerged across companies and industries. Drawing from our experience, we have developed a set of design principles and a reference architecture that can serve as a useful blueprint for a best-in-class developer platform and enable DevOps at scale.

Eight design principles for developer platforms

Platform-engineering teams should follow a set of core design principles when building the developer platform:

  1. Focus on the user. Because developers are the most important “customers” of a developer platform, they need to be heavily involved in a platform’s design, prioritization of features, and testing to make sure it is fit for purpose, user-friendly, and fully self-service. A user-centric design that addresses the needs of developers can make all the difference and drive adoption.
  2. Run the platform team like a start-up. IT leaders should establish a small, central team that not only owns the platform but is also responsible for marketing it (and has the resources to do so). The team also needs sufficient capacity to ensure the successful implementation of the platform through community forums, technical support, and clear and concise documentation.
  3. Decentralize the contribution model. While ownership falls to a central team, the contribution model should allow for decentralized, continuous development. As in an open-source model, contributors — for example, application development (app dev) teams — should be able to share plug-ins and propose changes, but the platform team should retain the final say over what is approved and accepted in the platform. It is important to acknowledge that app dev teams across the organization are at different levels of maturity, capability, and engagement — meaning not all are ready or willing to contribute to the platform from day one. To establish an inner-sourcing culture, organizations will need to support an operating model in which teams with the right skills and knowledge can contribute. They also need to maintain an upskilling mechanism for teams to grow over time into contributing roles.
  4. Build golden paths, with some cages. Developers should be free to choose the abstraction level that suits their needs. While a developer platform should provide a set of predefined patterns or “golden paths” for developers to follow — in addition to some “cages” they are not allowed to leave — it also needs to provide flexibility for developers to implement more-complex use cases or edge cases.
  5. Drive standardization by design. Enabling self-service requires platform engineers to establish secure foundations and define preconfigured application patterns to ensure every resource created through the platform is secure, compliant, and well architected. However, application patterns only standardize the first deployment, after which teams can edit and drift away from the standard. To ensure resources stay compliant and in line with the architected application patterns, the platform needs a mechanism that enforces these standards with every single deployment — for instance, via policies on the cloud-provider level, via automated checks in the pipeline or source code, or via centralized components such as a platform orchestrator that enforces standardization by continuously generating apps and infrastructure against the pattern during deployment.
  6. Get the most from existing capabilities. Every organization has at least some individual components of a developer platform in place (such as base and isolation zones in the cloud), and standard components such as backlog management, continuous integration and continuous deployment (CI/CD) toolchain, and test suites are widely available in the market. A modern developer platform serves as the glue between these individual components and offers an opinionated set of design choices on how to use and configure them (baked as code into the platform) to fulfill developer needs.
  7. Let developers choose their platform interface. The developer platform should not break developer workflows or force developers to use any specific interface. A code-based workflow should be the default, with options to use a user interface (UI), command line interface (CLI), or application programming interface (API).
  8. Define success as usage. We find that the true marker of success is an organization’s ability to drive traffic to the platform and entice developers to use it in their day-to-day work. A platform that enhances the experience of developers and addresses their pain points can create pull from the organization. Ultimately, developers should use the platform because they want to, not because they have to. This can be achieved via gamification and rewards (for example, by rewarding app dev teams and individuals for faster adoption) or enforcing certain standards (which are out-of-the-box ready in the platform).

Developer platform reference architecture

Modern developer platforms require interchangeable components and provide flexibility to suit both simple and complex (multicloud or hybrid) architectures. The exhibit below illustrates a commonly deployed AWS architecture that can provide a starting point for a modern developer platform. However, IT leaders should begin by assessing their existing architectures and developer pain points, and then incorporate the components that they already have in place. The components and tools referenced below exemplify an AWS-based setup. All components are interchangeable, and similar reference architectures can be implemented for GCP, Azure, OpenShift, or any hybrid setup. Individual components can be replaced with any homegrown, on-premises, or other cloud-native solutions.

Commonly deployed AWS reference architecture

The developer platform reference architecture includes five main components:

  • The developer control plane provides the interfaces through which developers can interact with the platform (IDE, API, CLI, GUI, GitOps, and so on). It is where developers can access the structured catalog with the components and documentation they need, and where they can implement, review, and deliver changes.
  • The integration and delivery plane has the tools that build, store, configure, and deploy all requests coming from the developer control plane — including CI/CD, registries, and platform orchestration.
  • The monitoring and logging plane provides real-time metrics and logs for applications and infrastructure, allowing developers to observe, monitor, and make data-driven decisions.
  • The security plane manages confidential information and identities to protect sensitive data — for example, by storing, managing, and securely retrieving API keys and passwords.
  • The resource plane includes all the components necessary to run applications delivered through the platform (as predefined in application patterns and workload specifications).

To assemble these planes into an end-to-end architecture, the platform engineering team must wire the individual components of the planes to one another, wire one plane to the other, and test the raw end-to-end flows to ensure smooth interplay.

To shortcut this process, we have defined this reference architecture for multicloud setups as code (AWS, Azure, GCP, and OpenShift), which enables an organization to set up a basic developer platform within a few hours. The platform setups are codified as Terraform scripts, and the workload specifications for common application patterns (such as three-tier application, microservice, and message bus) are implemented both as Terraform scripts and by using the open-source workload specification Score. Later, a platform orchestrator (a rule engine that matches resource requests to predefined default configurations provided by the platform team) dynamically creates the final configurations on each deployment. Multiple options can come into play as orchestrator, including cloud-native (such as AWS CloudFormation), homegrown (such as using Terraform and Argo CD), or off-the-shelf (such as Humanitec) services.

Contribution and governance model

Because developer platforms are designed to enable standardization, security, and compliance, it is critical to clearly define the setup and governance of the repositories from the beginning. We differentiate between repositories owned by development teams and those owned by platform teams: the former is based on workload, and the latter is shared across the organization.

On the development team level, the workload-based repository should include the source code, workload specification, docker file, and pipeline YAML. The “traditional” app configurations and infrastructure-as-code (IaC) files are replaced with a workload specification. This allows developers to describe their architecture (the relationship to other workloads and dependent resources) in a general, abstract, and environment-agnostic way. For instance, a workload specification could read, “I am a workload of the name ‘Python service.’ I have a dependency on a database of type PostgreSQL, an AWS S3 bucket, and a DNS.”

Meanwhile, the platform team should be responsible for developing the following cross-cutting elements:

  • Resource drivers and infrastructure as code: A catalog of all resources made available by the platform team (such as cloud-native components and on-premises) and templates for dynamically created resources
  • Resource definitions: Guidance on which resource drivers to use in a specific context; for example, “use this type of AWS S3 bucket” if the context of the deployment is a “staging” environment; use another one if it is “production”
  • Workload profiles: Default configuration details for application configurations, such as CPU minimum allocation, labels and annotations, and sidecar containers
  • Automation or compliance: Governance defined as code in the platform — for example, definitions of environment progression, deployment automations, and sign-off rules

The objective of this governance model is to provide development teams with ownership over their code and freedom to deliver it to production — as long as it stays within the guardrails defined by the platform team.

The goal of this approach is to cover 80 percent of the organization’s requirements with a few app patterns and resources (centrally defined by the platform team) and enable the rest to be defined via a decentralized contribution model (such as an open-source model), with the platform engineering team retaining final approval over contributions.

Say a developer needs an ArangoDB database (and has a valid business reason to stray off the default components defined in the platform), but the platform doesn’t support the definition yet. Assuming the platform team grants the rights, the developer should then expand the resource definitions — effectively “teaching” the platform how and when to deploy an ArangoDB — and send a pull request to the organization-wide repository that the platform team can then review and approve. If the next user gets into a similar situation and the platform team has globally accepted the definition, the procedure is already fully automated.

Why is this approach “better” than the current CI/CD flow? Because it allows continuous standardization of the existing estate through the platform. Imagine updating labels and annotations across all your workloads or updating your PostgreSQL definition from version 14 to version 15. Or say you want to reuse infrastructure definitions without having to go through security review every time. This platform design makes doing so possible while enabling platform teams to drive standardization, security, and compliance by design. With this design, platform teams are also able to optimize the load and infrastructure configuration organization-wide (instead of by team) and drive the automation of FinOps policies (FinOps as code) to reduce cloud spend systematically.

***

Developer platforms are a growing trend, but few organizations have fully cracked the code. By starting with a set of design principles and a reference architecture based on best practices and enterprise learnings, IT leaders can more confidently take the first steps toward building a modern developer platform and unlocking the full potential of their software development teams.

--

--