How Work On The Scrum Team Survey Taught Us 10 Lessons About Agile

Christiaan Verwijs
The Liberators

--

You can also listen to a podcast of this article.

If last year centered on anything for us, it was product development for the Scrum Team Survey. Barry Overeem and I consciously made the decision to branch out into new domains and tap into other ambitions. So in recent years, we spent most of our time writing code, providing support, and uncovering new user needs, for the Scrum Team Survey. Barry focused more on the community and the content, and I focused on the software development and support.

We feel quite proud of the work we’ve done this year. Even though it's still very hard to cover even part of our costs, we’re definitely on to something. Its also been a very intense year. Since we’re both quite bad at keeping a sustainable pace, we’ve spent way more hours a week on this than are probably good for me. With our global customer base, we’ve also had to provide customer support to breaking issues well outside of local work hours. This also included some frantic firefighting of server issues at 3 AM in the morning on one or two occasions. But we love it.

In this post, I want to capture some of our learnings from this year.

A bit of context on the Scrum Team Survey

After Covid-19 hit, Barry Overeem and I decided to turn the ship around. We wanted to find ways to offer actionable and evidence-based help to Scrum teams. So we started turning a former hobby project of mine, the Zombie Scrum Survey, into a proper product. The Scrum Team Survey exists to help Scrum teams diagnose their process and team, and to identify improvements — together. Because we believe in the scientific method as a way to develop more robust and reliable knowledge, we grounded our product and the feedback we offer in an extensive study of 2.000 Scrum teams I undertook with Daniel Russo, Ph.D. To date, over 6.300 teams have participated. We also have quite a few paying customers now — though nearly not enough.

Since I will offer some technical insights in this post, I think it is also helpful to paint a picture here. When our work started, we had a single web service. We now have 21 dedicated services spread out over an equal number of repositories. I would quality half of these as proper microservices, and the other half as larger services. All our code is written in C# (.NET 5). Each service is hosted in its own isolated Docker container and provided from servers we maintain. Most of the frontend is built with Angular (v11–13). We use AppVeyor as a continuous deployment pipeline and have performed over 1.200 deploys to production in 2021 (~25 a week). Virtually all our code is covered with automated tests, though never enough as we (and our customers) still encounter bugs now and then. Most of the communication between our services happens with Redis, MariaDB, and RabbitMQ.

1. For us, one goal per Sprint worked best

Early on, Barry and I agreed on a weekly Sprint-cadence. Our aim was to get something out the door at least every week, whether it is content or new features. We also stuck to singular Sprint Goals for the majority of the Sprints.

We visualized our “roadmap” in Mural. This roadmap shows the large functional areas of the Scrum Team Survey — written from the perspective of users — and originating from the purpose. The smaller stickies are initial Sprint Goals that we often refine into smaller goals before we start on them. Green stickies are done, yellow stickies are coming up. Stickies marked in blue are the ones we’re focusing on now.

We maintain a visual circular roadmap on Mural.

Another finding related to Sprint Goals is that they can also be quite stressful. The cadence for most Sprints tends to be that the first few days are mostly exploratory. We deploy prototypes or try out different strategies to achieve the Sprint Goal. This is where we discover the true complexities involved. Sometimes it is easier than expected, sometimes it is harder. So day two is usually where we have to renegotiate scope and/or narrow the Sprint Goal. More often than not, I overestimate my own ability to deliver the remaining scope and then have to crunch for the rest of the week to get it done. This isn’t so much a matter of releasing — this usually happens every day — but more of how my self-imposed discipline gets in the way of pragmatism. I always have to remind myself that our Sprints are artificial in the sense that deliveries to production happen continuously anyways, so nothing is lost when some of the work would be carried into the next Sprint. But I don’t like it. I like the tension arc in each Sprint, and how you start with a fresh slate for each new Sprint.

I suppose all of this simply means that our Sprint Goals are too ambitious more often than not. But I also struggle to see how we can make them smaller. We tend to discover the full scope and complexity during the Sprint, and not before it, regardless of how much refinement we do. Perhaps it is mostly a matter of being kinder to ourselves and our commitments :)

2. For us, a single branch in Git works best

Our branching model in Git is based on a single branch, which should always be stable and ready to deploy to production. I strongly believe that a single branch that always goes to production leads to healthier developer habits than multiple active branches, although I recognize that this is easy to say when you’re the only active developer. Even so, we also have only a production environment and my local development environment. There is no staging or development environment somewhere in the cloud.

The only exception I make for branching is when I expect the work to be more substantial than — say — a day of work and that work will break existing code. In that case, I create a feature branch and merge it back as soon as all the tests pass. I started doing this more diligently when I ended up with an unstable codebase on our primary branch and was faced with a critical bug on production that I had to address quickly in our primary branch as well. This basically meant I had to weed out the bugfix from the other code I changed, and this lead to stupid mistakes and a few issues on production.

3. For us, it works best to run integration tests on production-ready Docker Containers

I’ve always been a big fan of automation, especially when you need to deploy to production. Every manual action tends to become an excuse for me to delay a release, and that inevitably leads to large batched releases with a lot of potential for breaking errors. I’ve found that we have far fewer critical bugs when we deploy many small releases during a Sprint instead of one larger one at the end.

However, we invested a substantial amount of time in 2021 to optimize this process incrementally. For each of the services that make up the Scrum Team Survey, these steps now automatically happen for each release:

  1. I trigger the pipeline by committing to our primary branch for that service.
  2. Compile code for the service.
  3. Run unit tests on the service and calculate code coverage.
  4. Build a docker container for the service.
  5. Deploy a local Docker stack on the build server to start up the docker container, including all necessary dependencies (Redis, MariaDB, RabbitMQ). We usually mock out all dependencies on other services and close all “outgoing” traffic (e.g. emails, external APIs).
  6. Run the integration and UI tests on the local Docker stack. This effectively allows us to run the tests on the services as compiled into the same Docker containers that will be run on production, just with a different configuration. This is as close to production as we can test.
  7. When all tests pass, push the Docker container to our Docker Hub.
  8. Automatically trigger our Docker hosts to update the container and restart the service.

All in all, this process takes between 10 and 20 minutes based on the complexity of the service. UI tests tend to be slower, so those services also take more time. But I’ve come to trust this process to the extent that if all tests pass, I often have to remind myself to also check if the deployment worked out.

Overview of some recent builds in AppVeyor

A good test case of this process happened when we migrated from .NET Core 3.1 to .NET 5. I was dreading and postponing this release because I was worried about breaking dependencies. But it ended up being super simple. All I had to do was change the project settings from .NET 3.1 to .NET 5.0, and update the Dockerfile to start with a .NET 5 (Alpine) base image. Because the automated process took care of all the tests and the actual deployment, I could simply make the changes to each service and commit. This allowed me to update all services in a day.

4. For us, it works best to monitor everything

Regardless of how well your tests are, your production environment remains subject to unexpected bugs and issues outside of your control. On some occasions, we remained unaware of bugs until a user contacted us. Although it's always great to see that kind of dedication, I really want to know of a bug before a user does.

So we implemented detailed (non-sensitive) logging for all our services. These log files are sent to private Slack channels so that we have everything in one place, and can benefit from Slack’s advanced notification features. We log both server-side issues and client-side (Javascript) issues.

To monitor the health of our services, we also use UptimeRobot. What I like about this platform is that it monitors all sorts of things that easily break, like SSL expiration and slow response times. For all our services, UptimeRobot also checks a custom health-endpoint on each service. This endpoint gives back an UNHEALTHY when something is wrong (with details on what is wrong), and a HEALTHY when everything is in order. It is simple, but it works well. We also use UptimeRobots push options to notify us of critical server issues with a loud alarm. Very alarming, especially when it happens at 3 AM at night (no joke) due to a storage failure at our hosting provider.

5. For us, it works best to proactively reach out to users that face issues

A benefit of the active monitoring I described before is that it allowed us to proactively reach out to users who seemed to be facing issues. This is always better than waiting for them to contact you. For example, last week a user was trying to change his answers. But a recent deployment accidentally broke the specific path this user was going through. So I resolved the issue and contacted him that he could now proceed. This kind of proactive reaching out happens frequently now. I’ve found that people respond very positively to this. They also like to interact with a developer directly. More than once, additional useful ideas were offered in the email chain that followed.

Customers can feel like a disruption to your workflow. And honestly, I sometimes feel the same way when I’m in the middle of coding. But you can learn so much from your users that every situation is worthwhile to follow up on.

I always encourage Scrum teams to do this too. Talk to your users, especially when they are facing issues. Not all responses may be friendly, and some people will be highly annoyed by issues, but particularly in this situation, there is a lot to salvage through proactive support. And you can address the issue quickly. That builds trust.

One simple lesson I learned quickly is that I have to explicitly add to the email subject that my email is typed by a human. With the onslaught of marketing automation platforms, people tend to ignore all e-mails they get from a platform because they expect it to be marketing. More than once, people ignored my first email and responded with a “Sorry, I didn’t know it was a personal email” in my second (and last) attempt.

6. For us, better architecture evolves from continuous refactoring

The initial codebase for the hobby project that turned into the Scrum Team Survey left a lot to be desired. Most of the logic was concentrated in a single monolith service. Performance was quite low and my changes frequently broke code in unexpected — and untransparent — ways. This is why Ward Cunningham coined the term “technical debt”. The higher it gets, the more interest you pay. This interest takes the form of more bugs, more time spent fixing weird issues, and more time required to comprehend code. And in that analogy, refactoring is the process by which that debt is repaid.

So we used each Sprint to deliver on the Sprint Goal and to refactor all code we touched during that Sprint that needed it. Aside from reducing technical debt, I found that this process of continuous refactoring vastly improved our architecture over time. For example, I realized that my refactors effectively started to separate the commands from the queries in the codebase. Where I initially started out with a Repository pattern where each repository would perform all CRUD actions for a single type of entity, I eventually refactored this to a specialized repository for Query (R) operations and a specialized repository for Commands (CUD). These refactors began with a desire to improve performance and make the code more robust, but ended up paving the path towards proper command-query separation. Once the repositories started separating these concerns, I found that I also started using different channels. Commands are now handled through asynchronous messages on RabbitMQ, whereas Queries are run synchronously through an API. Eventually, I would like to use optimized data stores for queries and commands. Where commands are processed on the database, I could easily use a fast read-only datastore with optimized read models for queries (e.g. Redis). Although the need for such optimization isn’t there yet, the architecture is evolving in that direction.

We started out with bulky and messy Repositories (right) that replicated a lot of code, had too many responsibilities (selection, filtering, mapping to domain objects, some domain logic). Eventually, we refactored this into specialized Query- and Command-repositories. The latest iteration of the TeamQueryRepository is on the left, which I find much cleaner and easier to maintain. The new repositories draw from a RepositoryBase that is highly optimized and rigorously tested.

Another example was how we continuously extracted logic from initial monoliths and moved it into dedicated (micro)services or NuGet package to share between projects. We now have the following microservices:

  1. A service for sending newsletters and recurring emails
  2. A service for sending notifications to people by email
  3. A service for in-app alerts
  4. A service to act as a library of content for our other apps
  5. A service to handle administrative tasks related to invoicing
  6. A service to subscribe to the Scrum Team Survey
  7. A service to view your results
  8. A service to participate in the questionnaire
  9. A service to administer multiple teams from a dashboard
  10. A service to act as the landing site for the Scrum Team Survey
  11. A service to serve shared assets and styles to other services
  12. A service to securely hold API tokens for third-party services, and make them available to our services on request (and only in-memory)
  13. A service to generate nice PDFs for the do-it-yourself workshops
  14. A service to handle asynchronous events that take more time, like generating the report for a team.

The most challenging part about this was optimizing the process of spinning up and configuring new Docker containers. We standardized this to the point that it now takes at most 2 hours to set up and wire a new “Hello World” microservice into production. And the resulting infrastructure is now more robust against outages or bugs in individual services.

Refactoring is never done. I’m never fully satisfied with how the code came out. But I’ve come to trust in the iterative nature of the work on this product. I will revisit certain code blocks frequently and over time better ideas on how to redesign them emerge. To date, this process has turned out much better code than my initial designs.

7. For us, iterations make every mountain small

When I look back on almost a year of work, I recognize how many features we implemented that seemed insurmountably tough when we started on them. A good example of this is the visualization of the results we recently added:

Before we started on this feature, I was worried about how to draw diagrams with CSS and HTML. I also worried about the responsiveness of the whole thing. But we took it one step at a time. Our designer Wim Wouters worked with us on the first concepts in a first Sprint. We then prototyped different ways to technically visualize the results in a second Sprint. It turned out that it was much easier than expected, and we deployed our first version to production at the end of the second Sprint. This defied my personal expectations.

This example illustrates similar trajectories for other large features. I was also worried about the Team Dashboard (particularly logins) or our new Questionnaire module. But the great thing about short iterations is that it forces you to break one large problem into many smaller ones. Those smaller problems are then easier to solve and deploy to production. So you always feel there is progress and forward momentum. It is only when you look back in the rearview mirror that you realize just how much you have already delivered. This made even the most daunting features on our Product Backlog seem very doable and smooth in practice.

8. For us, the best Sprint Reviews are with users

Barry Overeem and I hosted over a dozen larger Sprint Reviews last year. Barry facilitated them, and I participated as a developer. Each of these reviews was a marvelous opportunity to get feedback from actual users. Between 10 and 30 people usually attended our Sprint Reviews.

Our aim with our Sprint Reviews was always to let people interact with the working product. So we invited one or all users to try out a new feature during the Sprint Review, and then used a Liberating Structure like 1–2–4-ALL to gather feedback from everyone’s experience. It was thoroughly enjoyable. And also humbling at times, as I discovered how people struggled with a UI that I thought was clear enough.

9. For us, our patrons pulled us through

Most of our work is community-funded. We found a lot of support in our growing community of patrons. These are people who contribute to our work with a monthly donation and are effectively our stakeholders. Honestly, their contributions and feedback have been keeping us afloat last year, along with some workshops and training that Barry hosted to boost the revenue. There is also also a growing community of paying subscribers for the Scrum Team Survey. Together, these streams of revenue are necessary for us to keep our platform online, to invest in its development, and to keep most of it as free as possible.

So if you like the Scrum Team Survey, and you’re not a patron or subscriber yet, perhaps we may be able to convince you to join in this year?

10. For me, it is hard to be the only developer

My biggest regret this year is that I had to write the technical parts of this post with the singular “I”. Although Barry and I work together where we can, he is not a developer. Contrary to many development teams I’ve been part of in the past, I didn’t have other developers to share the burden with. When a service goes down, I have to fix it. When a nasty performance bug appears, I have to investigate it. When I’m stuck on a piece of code, I have to figure it out myself. Of course, I sparred with some friends who are also developers, but it's very different from interacting with developers who know the codebase, the product and share the responsibility.

So why are there no more developers? This is mostly a financial decision. We don’t have external sponsors or investors, so we fund the development ourselves. Unfortunately, Covid-19 put a big dent in our ability to generate revenue from other streams (like workshops and training). As much as I would love to, we just don’t have the money to hire other developers and rent a space for them to work together. My hope for the coming years is that the revenue we can generate from the platform allows us to hire more developers to share the burden with. It is also much healthier, both for me and the product :)

Closing words

In this post, I shared 10 lessons that we learned from our work on the Scrum Team Survey. What connects these lessons is how they teach us something about Agile and Scrum. In effect, each lesson is about how to get feedback faster — from users facing issues, from production issues, from where our architecture is buckling, and from Sprint Reviews.

--

--

Christiaan Verwijs
The Liberators

I liberate teams & organizations from de-humanizing, ineffective ways of organizing work. Developer, organizational psychologist, scientist, and Scrum Master.