Roles and Responsibilities for DevOps and Agile Teams

This is a relativly done draft of a white paper I’ve been working on. I thought I’d post it here just to see what happens. I mean, who likes PDFs after all? (Well, OK, I do, but you get my point.)

Overview

There are a handful of “standard” roles used in typical Agile and DevOps teams. Any application of Agile and DevOps is highly contextual and crafted to fit your organization and goals. This document goes over the typical roles and their responsibilities and discusses how Pivotal often sees companies staffing teams. It draws on “the Pivotal Way,” our approach to creating software for clients, recommendations for staffing the operations roles for Pivotal Cloud Foundry (our cloud platform), and recent recommendations and studies from agile and DevOps literature and research.

I don’t discuss definitions for “Agile” or “DevOps” except to say: organizations that want to improve the quality (both in lack of bugs and “design” quality as in “the software is useful”) and uptime/resilience of their software look to the product development practices of agile software development and DevOps to achieve such results. I see DevOps as inclusive of “Agile” in this discussion, and so will use the terms interchangeably. For more background, see one of my recent presentations on this topic.

Finally, there is no fully correct and stable answer to the question of what roles and responsibilities are in teams like this. They fluctuate constantly as new practices are discovered and new technologies remove past operational constraints, or create new ones! This is currently my best understanding of all this based both organizations that are using Cloud Foundry and those who are using other cloud platforms. As you discover new, better ways, I’d encourage you to reach out to me so we can update this document.

Core Roles for DevOps Teams

There are generally two types of teams we see: “business capabilities teams” who work on the actual software or services (“the application”) in question and “agile operations teams” as seen below:

While not depicted in the diagram above, the amount of staff in each layer dramatically reduces as you go “down” the stack. The most number of people are in the business capability layer working on the actual applications and services, many less individuals are working on creating and customizing capabilities in Pivotal Cloud Foundry (e.g., a service to interact with a reservation or ERP system that is unique to the organization), while many, many less “operators” work at keeping the cloud platform up and running, updated, and handle the hardware and networking issues.

Making the Applications — Business Capability Roles

These teams work on the application or services being delivered. The composition of these teams changes over time as each team “gels,” learning the necessary operations skills to be “DevOps” oriented and master the domain knowledge needed to make good design choices.

These are the core roles in this layer:

  • Developer/Engineer — those who not only “code,” but gain the operations knowledge needed to support the application in production, with the help of…
  • Operations — those who work with developers on production needs and help support the application in production. This role may lessen over time as developers become more operations aware, or it may remain a dedicated role.
  • Product Owner/Product Manager — the “owner” of the application in development that specifies what should be done each iteration, prioritizing the list of “requirements.”
  • Designer — studies how users interact with the software and systematically designs ways to improve that interaction. For user-facing applications this also includes visual design.”.

These are roles that are not always needed and sometimes be fulfilled partially by shared, but designated to the team staff:

  • Tester — staff that helps with the effort ensure the software does what was intended and functions properly.
  • Architect — in large organization, the role or architect is often used to ensure that individual teams are aligning with the larger organization’s goals and strategy, while also acting as a consultative enabler to help teams be successful and share knowledge.
  • Data scientist — if data analysis is core to the application being developed, getting the right analysis and domain skills will be key.

Developer/Engineer

These are the programmers, or “software developers.” Through the practice of pairing, knowledge is quickly spread amongst developers, ensuring that there are no “empires” built, and addresses the risks of a low “bus factor.” Developers are also encouraged to “rotate” through various roles from front to back-end to get good exposure to all parts of the project. By using a cloud platform, like Pivotal Cloud Foundry, developers are also able to package and deploy code on their own through the continuous integration and continuous delivery tools.

Developers are not expected to be experts at operations concerns, but by relying on the self-service and automation capabilities of cloud platforms do not need to “wait” for operations staff to perform configuration management tasks to deploy applications. Over time, with this reliance on a cloud platform which cleanly specifies how to best build applications so that they’re easily supported in production, developers gain enough operations knowledge to work without dedicated operations support.

The amount of developers on each team is variable, but so far, following the two pizza team rule of thumb, we see anywhere from 1 to 3 pairs, that is 2 to 6, and sometimes more.

Operations

In a cloud native mode, until business capabilities teams have learned the necessary skills to operate applications on their own, they will need operations support. This support will come in the form of understanding (and co-learning!) how the cloud platform works, and assistance troubleshooting applications in production. Early on you should plan to have heavy operations involvement to help collaboration with developers and share knowledge, mostly around getting the best from the cloud platform in place. You may need to “assign” operations staff to the team at the beginning, making them so called designated operations staff instead of dedicated, as explained in Effective DevOps.

Many teams find that the operations role never leaves the team, which is perfectly normal. Indeed, the desired end-state is that the application teams have all the development and operations skills and knowledge needed to be successful.

As two cases to guide your understanding of this role:

  • Etsy has approximately 15 operations engineers to a few hundred other engineers, according to Effective DevOps.
  • As re-told by Diego Lapiduz at 18F, early on when teams were learning how to use and operate cloud.gov, he and a handful of other operations staff spent much of their time with development teams, getting intimately familiar with each application. Now, because the practice of designated operations staff is less needed, he and his operations peers are less involved and have little knowledge of the applications in use…which is good, and as intended!

As a side note, it’s common for operations people to freak out at this point, thinking they’re being eliminated. While it’s true that margin-berzerked management could choose to look at operations staff as “waste,” it’s more likely that following Jevon’s Paradox, operations staff will be needed even more as the amount of applications and services multiply.

Product Owner/Product Manager

This is the role that defines and guides the requirements of the application. It is also one of the roles that varies in responsibilities the most across products. At its core, this role is the “owner” of the software under development. In that respect, they help prioritize, plan, and deliver software that meets your requirements. Someone has to be “the final word” on what happens in high functioning teams like this. The amount of control vs. consensus driven management is the main point of variability in this role, plus the topic areas that the product owner must be knowledge of.

It’s best to approach the product owner as a “breadth first role”: they have to understand the business, the customer, and the technical capabilities. This broad knowledge helps them make sure they’re making the right prioritization decisions.

In Pivotal Labs engagements, this role is often performed by a Pivotal employee, pairing up with one of your staff to train and transfer knowledge. Whether during a project or through workshops, they help you understand the Pivotal, iterative development approach, as well as mentor and train your internal staff to help them learn agile methods and skills, which will enable them to move on with confidence when the engagement is complete.

Designer

One of the major lessons of contemporary software is that design matters, a tremendous amount more than previously believed. The “small batch” mentality of learning and improving software afforded by cloud platforms like Pivotal Cloud Foundry gives you the ability to design more rapidly and with more data-driven precision than ever before. Hence, the role of a designer is core to cloud native teams.

The designer focuses on identifying the feature set for the application and translating that to a user experience for the development team. Activities may include completing the information architecture, user flows, wireframes, visual design, and high-fidelity mock-ups and style guides. Most importantly, designers have to “get out of the building” and not only see what actual users are doing with the software, but get to know those users and their needs intimately.

Testers (partial/optional)

While the product manager, and overall team are charged with testing their software, some organizations either want or need additional testing. Often this is “exploratory testing” where a third party (the tester[s]) are trying to systematically find the edge cases and other “bugs” the development team didn’t think of.

It’s worth questioning the need for separate testers if you find yourself in that situation to make sure you need them. Much routine “QA” is now automated (and can, thus, be done by the team and automated CI/CD pipelines), but you may want exploratory, manual testing in addition to what the team is already doing and verification that the software does as promised and functions under acceptable duress. But even that can be automated in some situations as the Chaos Monkey and Chaos Lemur show.

Architect (partial/optional)

Traditionally, this role is responsible for conducting enterprise analysis, design, planning, and implementation, using a “big picture” approach to ensure a successful development and execution of strategy. Those goals can still exist in many large organizations, but the role of an architect is evolving to be an enabler for more self-sufficient, and decoupled teams. Too often this role has become a “Dr. No” in most large organizations, so care must be taken to ensure that the architect supports the team, not the other way around.

Architects are typically more senior technical staff who are “domain experts.” They may also be more technically astute and in a consultative way help ensure the long-term quality and flexibility of the software that the team creates, share best practices with teams, and otherwise enables the teams to be successful. As such, this role may be a fully dedicated one who, hopefully, still spends much of their time coding so as not to “go soft” and lose not only the trust of developers but an intimate enough knowledge of technology to know what’s possible and not possible in contemporary software.

Data Science (partial/optional)

If your application includes a large amount of data analysis, you should consider including a data scientist role on the team. This role can follow the dedicated/designated pattern as discussed with the operations role above.

Data Science today is where design might have been a few years ago. It is not considered to be a primary role within a product team, but more and more products today are introducing a level of insight not seen before. Google Now surfaces contextual information; SwiftKey offers word predictions based on swipe patterns; Netflix offers recommendations based on what other people are watching; and Uber offers predictive arrival times of their drivers. These features help turn transactional products into smart product.

Other Roles

There are many other roles that can and do exist in IT organizations. These are roles like database administrators (DBAs), security operations, network operations, or storage operations. In general, as with any “tool,” you should use what you need when you need it. However, as with the architect role above, any role must reorient itself to enabling the core teams rather than “governing” them. As the DevOps community has discussed at length for nearly ten years, the more you divide up your staffing by function, the further you move from a small, integrated team, and your goal of consistently and regularly building quality software will become harder.

Agile Operations Roles

Roles here focus on operating, supporting, and extending the cloud platform in use. These roles typically sit “under” the Agile and DevOps teams, so the discussion here is briefer. Each role is described in term of roles and responsibilities typically encountered in Pivotal Cloud Foundry installs. These can vary by organization and deployment (public vs. private cloud, the need for multi-cloud support, types of IaaS used, etc.)

For a brief definition of a cloud platform, see “the use of a cloud platform” section below.

Application Operator

These are typically the “operations” people described above and serve as a supporting and oversight function to the business capabilities teams, whether designated or dedicated. Typical responsibilities are:

  • ŸManages lifecycle and release management processes for apps running in Pivotal Cloud Foundry.
  • ŸResponsible for the continuous delivery process to build, deploy, and promote Pivotal Cloud Foundry applications.
  • ŸEnsures apps have automated functional tests that are used by the continuous delivery process to determine successful deployment and operation of applications.
  • ŸEnsures monitoring of applications is configured and have rules / alerts for routine and exceptional application conditions.
  • ŸActs as second level support for applications, triaging issues, and disseminating them to the platform operator, platform developer or application developer as required.

A highly related, sometimes overlapping, role is that of centralized development tool providers. This role creates, sources, and manages the tools used by developers all the way from commonly used libraries to, version control and project management tools, to maintaining custom written frameworks. Companies like Netflix maintain “tools teams” like this, often open sourcing projects and practices they develop.

Platform Operator

The is the typical “sys admin” for the cloud platform itself:

  • Manages IaaS infrastructure that Pivotal Cloud Foundry is deployed to, or co-ordinates with the team that does.
  • Installs and configures Pivotal Cloud Foundry.
  • ŸPerforms capacity, availability, issue, and change management processes for Pivotal Cloud Foundry.
  • ŸScales Pivotal Cloud Foundry, forecasting, adding, and removing IaaS and physical capacity as required.
  • ŸUpgrades Pivotal Cloud Foundry.
  • ŸEnsures management and monitoring tools are integrated with Pivotal Cloud Foundry and have rules / alerts for routine and exceptional operations conditions.

Platform Engineering

This team and its roles are responsible for extending the capabilities of the cloud platform in use. What this role does per organization can vary, but common tasks of this role for organizations using Pivotal Cloud Foundry are to:

  • ŸMakes enhancements to existing buildpack(s) and builds new buildpack(s) for the platform. ŸBuilds service broker(s) to manage lifecycle of external resources and make them available to Pivotal Cloud Foundry apps.
  • ŸBuild Pivotal Cloud Foundry tiles with associated BOSH releases and service brokers to enable managed services in Pivotal Cloud Foundry.
  • ŸManages release and promotion process for buildpacks, service brokers, and tiles across Pivotal Cloud Foundry deployment topology.
  • ŸIntegrates Pivotal Cloud Foundry APIs with external tool(s) when required.

Physical Infrastructure Operations

While not commonly covered in this type of discussion, someone has to maintain the hardware and data centers. In a cloud native organization this function is typically so highly abstracted and automated — if not outsourced to a service provider or public cloud altogether — that it it does not often play a major role on cloud native operations. However, especially at first as your organization is transforming this new way of operating, you will need to work with physical infrastructure operations staff, whether in-house or with your outsourcer.

Supporting Best Practices

The use of a cloud platform

Much of the above is predicated on the use of a cloud platform. A cloud platform is a system, such as Pivotal Cloud Foundry, that provides the runtime and production needs to support applications on-top of cloud infrastructure (public and private IaaS), and often, as in the case of Pivotal Cloud Foundry, with fully integrated middleware services that are natively supported (such as databases, queues, and application development frameworks).

Two of the key ways of illustrating the efficiency of using Pivotal Cloud Foundry are (a.) the ratios of operators to application running in the platform, and, (b.) reduction in lead time. Here are some relevant metrics for Pivotal Cloud Foundry:

  • One large financial institution is currently supporting 145 applications with two operations staff.
  • 18F was able to reduce Authorization to Operate (ATO) from 9+ months to 3 days once auditors understood the automation in the open source Cloud Foundry instance at cloud.gov.
  • When planning the first product developed on Pivotal Cloud Foundry, CoreLogic allocated a team of 12 developers with four QA staff and a management team. The goal was to deliver the product in 14 months. Instead, the project ended up requiring only a product manager, one user experience designer and six engineers who delivered the desired product in just six months. As the second, third, and next projects roll on, you can expect delivery time to reduce even more.
  • A large retail user of Pivotal Cloud Foundry was running over 250 applications in non-production and nearly 100 applications in production after only 4 months using the platform.
  • Humana was able to launch an Apple Watch application in just five weeks.

Without a cloud platform like Pivotal Cloud Foundry in place, organizations will find it difficult, if not impossible, to achieve the infrastructure efficiencies in both development and at runtime needed to operate at full speed and effectiveness.

For reference, the below diagram based on Pivotal Cloud Foundry, is a “white board” of the functions a cloud platform provides:

For a gentle introduction to this stack, see the introductory discussion video at The New Stack. And, for even more discussion of the nature of cloud platforms, see Brian Gracely’s discussion of “structured cloud platforms” and Casey West’s write-up on the same topic.

Sizing: two pizza teams

While some sizing guidelines have been listed throughout, there are no hard and fast rules. The notion a “two pizza team” is a well accepted theory of software team sizes. As Amazon CEO Jeff Bezos is said to have decreed: if a team couldn’t be fed with two pizzas, it’s too big. Without making too many assumptions about the size of a “large” pizza or how many slices each individual needs to eat, you could estimate this to around six to fifteen people. This may vary, of course, but the point is to keep things as small as possible to minimize communication overhead, context switching, and responsibility evasion due to “that’s not my job,” among other “wastes.”

The assumption is teams this small is that many of these small teams will organize to create big, overall results. Architectural approaches like microservices that emphasize loose coupling and defensive use of services help enable that coordination at a technical level, while making sure that teams are aware of and aligned to overall organizational goals help coordinate the “meatware.” This coordination is difficult, to be sure, but start looking at your organization a set of autonomous units working together, rather than one giant team. As an example, Pivotal Cloud Foundry engineering is composed of around 300 developers spread over 40 loosely coupled teams.

Dedicated, integrated teams lead to the best outcomes

Agile-think and DevOps teams seek to put every role and, thus, person, needed to deliver the software product from inception to running in production on the same team, with their time fully dedicated to that task. These are often called “integrated teams” or “balanced teams.”

IT has long organized itself in the opposite way, separating out roles (and, thus people) into distinct teams like operators, QA, developers, and DBAs in the hopes of optimizing the usage of these roles. This is the so called “silo” approach. When you’re operating in the more exploratory, rapid, agile cloud native fashion, this attempt at “local optimization” creates “global optimization” problems due to hand-offs between teams (leading to wastage from time and communication errors). A silo approach can also poor people interactions across team which damages the software quality and availability. “That’s not my responsibility” often leads it to being no one’s responsibility.

Most organizations have a “project” mindset when it comes to software as well. The functional roles emerge from their silos as needed to work on the project, and then disband once done. Good software development benefits from a “product” mindset where the team, instead, stays dedicated to the product.

Operating in this way, of course, is not always possible, if only at first as organizations switch over their mind-set from “siloed” teams to dedicated teams. So care must be taken when slicing a person up between different teams: intuitively we understand that people cannot truly “multitask,” and numerous studies have shown the high cost of context switching and spreading a person’s attention across many different projects. In discussing roles, keep in mind that the further you are from truly integrated, dedicated teams, the less efficiency and productivity you’ll get from people on those teams, and, thus, in the project as a whole.

More Reading

As described in the introduction, this is a quick overview of the roles and responsibilities for Agile and DevOps teams. The book Effective DevOps contains much discussion of these topics and background on studies and proof-points. For a slightly more “enterprise-y” take, see the concepts in disciplined agile delivery which should be read as slightly “pre-cloud native” but nonetheless helpful.

Additionally, Pivotal Labs, and the Pivotal Customer Success and Transformation teams can discuss these topics further and help transform your organization accordingly. With over twenty years of experience and as the maintainers of Pivotal Cloud Foundry, one of the leading cloud platforms, we have been learning and perfecting these methods for sometime.

Thanks to the many reviewer comments from Pivotal folks, in particular Robbie Clutton, Cornelia Davis, and David McClure. And, as always, excellent post-post-ironic corporate clipart from geralt.