A journey of Software delivery, from Agile adoption to #NoProject

Marius Konwicki
Swissquote Tech Blog
16 min readMay 5, 2019

The genesis

Back in the late 2000’s, Swissquote started to adopt the Agile principles, and applied the Scrum methodology. That was the time when I was promoted manager of the Securities Trading Platform team.

The golden age of Agile

Like other teams, our mission was to deliver new projects, but also maintaining the existing applications. My team grew at that time from 5 to 8 developers, and we needed to organize ourselves a bit better. We changed our sprints to a custom mix of Kanban and Scrum boards. That was a good way to isolate project tasks from unexpected requests, allowing us to be more predictive on delivery.

On the Scrum board, we had up to 5 developers working on the planned features.

The Kanban was used to deal with unexpected tasks (bug fixes, urgent requests, etc.). 2 developers were assigned for Support role. Often, the first one was on the front line, handling the bug reports, phone calls and requests from other departments, creating JIRA tasks and adding them to the Kanban board. The second developer was taking tasks from the board, starting with the most urgent ones.

It was a rotating role, so every developer was spending some time on support, and then some time back on project development.

The awful truth of project management

And yet, despite our great organization, our Burn-Down Charts were still refusing to meet the estimated delivery deadlines. The frustration was growing.

I started investigating why we were still late delivering things, although we were estimating them pretty well. We were already applying a factor on our estimations to match the historical bias of realized vs estimated, but even this couldn’t help.

I found out several reasons :

  • The rotation of developers between support and project was creating an additional latency, due to context switching. At the end of a sprint, a developer that have to pass from project to support may be working on a task that is not finished. Should the developer coming from support continue working on it? It would cost him some extra time to catch up with the work that was already done. Same for the support person that was in the middle of investigating an issue: it is easier to finish the investigation rather than explain the whole thing to another developer for handover. The result is that at the beginning of a sprint, we had developers that were finishing doing some support, and unfinished project tasks that were in hands of a support person.
  • Release and deployment process was slow. The release was done manually, on developer’s machine. We had to spend some time to validate on preprod. Our Operation team wanted us to be ready the day before, so we could easily lose 2 days to prepare a deployment. On a sprint of 2 weeks, this had an impact on our velocity.
  • Despite code review and testing, quality wasn’t good. We were facing rollbacks of our applications. So, instead of starting a new sprint with new tasks, we were bug-fixing the features from the previous sprint, and passing again through the process of releasing and deploying.
  • We had tasks that were blocked in the middle of our JIRA board. There were many reasons why a task could be blocked : we need a new endpoint in a REST service from another team. We need this new column in Production database. We need the final text for a disclaimer that we have to display. There are multiple reasons, and sometimes tasks were blocked for several sprints!

I read many books and articles, but I never found concrete solutions to these issues. Scrum methodology was meant for a small team, working on a single project, without interferences from other projects or support.

The Beginning of our journey to #NoProject

Introducing CI/CD

I needed first to fix the problem of quality and rollbacks that were slowing us down. In 2014, development teams were composed only of… developers. We had the chance to be the first team to hire a test engineer. We put in place the basis of QA at Swissquote. There is much to say about how we implemented it, this could be actually just another blog post ! The important thing is that quality improved dramatically.

At the same time, our applications were gaining in complexity. We shifted from monoliths web applications to a distributed architecture. Our web platforms were communicating with a growing network of REST services. This impacted the stability of our working environment. It was time to work on our infrastructure management.

So we decided to hire a DevOps for the team. He helped us move our applications on Docker (don’t miss the article from my colleague Manuel Ryan about Carnotzet), so developers could have isolated environments, easy to setup and run. The next step was to deploy the Docker images of all our REST services on a Kubernetes cluster. This way, we could run the final application on our local machines, and connect to a private cloud to call our dependencies.

This environment allowed us also to run integration tests, performance tests and have nightly builds. We dramatically improved release time and, combined with better Quality, we were able to deliver faster, more frequently, small evolutions or bug fixes. We became more reactive.

Needless to say that the transition is not over yet. We keep working on it to have an adoption from all the teams, have releases that can be containerized and deployed on Kubernetes up to production. When we will achieve this, we will unlock the possibility to have true Continuous Deployment…

Deploying unfinished work

The next problem we had identified was the tasks blocked for any kind of reasons. It was clear that despite upfront analysis to identify blockers, we could not avoid having tasks that suddenly fail in the middle of our sprint, and stay stuck there.

Why is this a problem ? First it introduces more context switching. But it forces us also to keep open branches in our Version Control, generating extra work to maintain these branches or doing complex merges. The solution was to use Feature Toggle systematically. We added configurations that allow us to activate the implemented features without any further deployment. We could then push to production the code even in unfinished state, with feature turned off. On the JIRA board, the task can be marked as done. All we have to do, is create a second task in the backlog to finish the missing part, and of course turn the feature on !

Toggle feature has another benefit : you can enable your feature only for internal employees or to early adopters. You get precious feedbacks and integration validation before granting access to all your clients !

Maintenance is work

There was a last issue to solve : the context switching between support and projects. Obviously, it’s impossible to get rid of support : Unexpected changes will never stop appearing.

It is important to realize that on support we do more than bug fixing. We are doing the maintenance of our applications. There is always a shortcut made during a project to match a deadline that we want to rewrite clean. There is missing code coverage (often for the same reason). Furthermore, we must keep evolving our frameworks, upgrade versions, and I’m not even mentioning all the small business evolutions we are asked for. All this maintenance is never part of any project, so we have to do it on the support time. But our time is limited, so generally these tasks end up in our backlog.

In recent years, our backlog kept growing despite all our attempts to reduce it. In 2017, this became my main priority as a manager. It was time to reverse the trend.

On support, we were working on critical bugs and often on the newest requests, as long as they look easy to do (a minor issue or a quick evolution for a business feature, for example). If working to fix a critical issue is exactly what you would expect from your support members, it’s different for the small requests. Even if it seems a good idea to address immediately a request considered as a “quick win” (someone really needs it and you know it takes a couple of hours to do it), it is not. Because instead of doing 2 or 3 quick wins in a row, you could have worked on a task from the backlog that is waiting for some time and would add much more value to your clients.

How we stepped out of projects

We needed to work in a different way : we would continue of course fixing critical bugs and working on critical requests first. But all the other new requests would be systematically put in the backlog.

Then, from this backlog, we identify the set of tasks that would bring a maximum of value for our end clients and pick enough of them to fill a sprint. Then we have to convince our stakeholders that we should work on them instead of some random project.

So we started sorting the tasks we had in the backlog. A questions arose :

How to compare the added value of so many different kind of requests?

For example between a broken link on the head news or add these JMX entry points to allow a better monitoring and faster response time on service downtime ? If you ask Marketing or IT Operations which task is the most important, you will have 2 clear, but different answers. So you have to ask yourself : what is the real value added for the final user ?

To be honest, you simply cannot compare the value of these 2 tasks, as they are from 2 very different concerns. So here is the idea : let’s group the tasks in the backlog by concern : All tasks that refer to a better browsing experience in one group, and all that refer to availability improvement to another. Now you can prioritize tasks within their own groups. But better, you can also compare at higher level which concern should be addressed first :

If the « Better Browsing Experience » concern contains just a couple of dead links but on « Better Service Availability » you have tasks to give visibility on connection pool usage, cache, memory that would help prevent pool exhausted, memory leaks, etc. and you know you are blind in case of outage, you may definitely want to focus on the operational aspect. On the other hand, if browsing your website is full of dead links, outdated information, lack of browsers compatibility, and many other bad things, while your second concern contains just a few requests to have nicer Grafana display, you should definitely focus on the browsing experience first.

The conclusion is that you cannot compare single tasks from the whole backlog and hope to find a global prioritization for all the tasks. Instead, grouping them into different topics allows you to have several bundles in which you can prioritize tasks, and you can then compare bundles as a whole.

Working on the highest value first

With my team, we started to identify these concerns. Someone came with the proposal to apply ICE score to each bundle, and use this score to decide on which concern we should start working first. We used the « Impact » score as the value added to the final user. Thanks to that, we had our different concerns sorted by priority. We were ready to start working.

You could ask : What’s the difference of working on tasks from a bundle compared to a plain Kanban approach ? Or couldn’t we consider that each « concern » is just another project ?

Kanban works fine with a small backlog of tasks, typically bugs. You take the most critical one, you deliver it, and then you take the next critical one. But at that time, we had cumulated over 500 different bug reports or maintenance tasks. There is no way you can display them, sorted by priority, on your Kanban board.

In our approach, a « concern » is a bundle containing both bug fixes and evolution requests, major and minor tasks, including refactoring or technical evolutions, and applying potentially to a lot of different applications. But all together, these tasks have a common goal, improve a certain aspect that matters to the client.

And here comes another important aspect that will help us reduce our backlog : because we want to address a global concern, sometimes it may be smarter to rewrite a part of our application instead of taking all the bugs and applying the fixes one by one. Let’s say for example that you identified more than 20 dead links. If you fix them independently, you have done a boring job, and you have no guarantee that you won’t have again broken links. But as you are working on a « better browsing » concern, you could decide for example to create instead an administration tool to manage links. You evolve all your navigation links at once to read the path from this tool. Then you know that next time a page url changes, all you have to do is to configure the new link in your administration console ! You have improved the experience, but you have done it in a way that improves your product quality at the same time !

You could never have reached this by simply fixing a broken link here and there. Working on a whole concern gives you enough time to plan evolutions that reduce at the same time many entries in your backlog. And you know what? Sometimes, working on these improvement does not cost much more than fixing blindly all the bugs separately! Kanban mode is too reactive for a real Product approach. With this method, we have a chance to really improve our product.

A « concern » bundle is neither a project, in the usual sense. It was not triggered by any stakeholder, does not have an allocated budget neither a deadline. I mean, all these tasks, they were already waiting in our backlog right ? So deadlines for such tasks does not make much sense. Thanks to our ICE score, we know that the chosen bundle is the one bringing the highest value. So it’s simple : we should start working on them now, and deliver whenever we’re ready. As we are maintaining a product, and not investing into a new feature, our budget is simply the resources we have, and we should keep working on this concern as long as it is the most important one.

We could ask ourselves, why even keep sprints in this case ? It’s a good question. First, they help giving a « rhythm » to your work, scheduling the demos, retrospectives, etc. Second, it gives a timeframe in which you can re-compute the different ICE scores and re-evaluate your priorities. Typically, after having delivered 80% of a concern, it may be more interesting to start working on a different concern, instead of absolutely finishing all the tasks of the first one. This is another difference with the project approach, where the team has to deliver all the listed features so that the project can be considered as done.

Finally, creating these concerns gives more visibility on your backlog to the different stakeholders, and make them understand why you need time to work on maintenance.

Dealing with a complex Project organization

As the company was growing, so was our organization. Some teams, including mine, were assigned the mission to focus on the maintenance of our products. So, we had the chance to actively work on our « concern » bundles. At the same time, we had Scrum Masters and Tech Leads in other teams that were collaborating with developers in Offshore to work on big projects. We had 2 different models cohabiting. We needed to define how we hand projects over to the maintenance teams once the project is done. We had to design complex merging strategies to allow small maintenance evolutions be deployed between 2 main project iterations. Having multiple teams working on the same codebase, with different objectives, is extremely difficult. We were often deadlocking each other. Projects were delivered late and the backlog of maintenance was not reducing as expected.

Beginning 2019, the company decided to adopt the Product approach at company level. So now, all the requests (bugs, maintenance, evolution, business features) for a product will be centralized and prioritized by a Product Manager. He will be responsible for the vision for the product, and organize the prioritization of the work. A committee of stakeholders will meet on regular basis and vote for the next requests that should be worked on. Both maintenance and offshore teams will then commit together to deliver them. The planning should not exceed 3 months. For big projects, we will have to deliver a MVP, and then have deliverables that can be shipped at least every 3 months.

This change is an important step toward a #NoProject management of software delivery at Swissquote. Onboarding all the departments in a common vision and planning is absolutely necessary. But on our way to Product approach, there are dangers and pitfalls we will have to avoid.

The journey continues…

The hidden cost of MVP

Everyone understands nowadays that a big project needs to be broken into smaller pieces that can be delivered in short iterations. So the Business Owner has to identify the minimal set of features that can bring a project to life. Delivering a MVP to your clients secures the initial investment. You can then re-evaluate your project priorities according to their first feedbacks.

Delivering a MVP is definitely a good thing. But the risk is to forget to re-estimate the costs of the next phase after having delivered the MVP. Such version usually contains simplifications and shortcuts. And this is perfectly fine, we do not want to invest into robust, reusable software for a MVP. But this has to be refactored to build the next phase on top of it. There is some code that must be deleted, and properly rewritten.

The other aspect to take into consideration is that your MVP becomes a living product. Your clients start using it, so you have to take into consideration the maintenance costs. They will report bugs, complain about some features, ask for others, etc. A good project manager will be able to take these demands into account and adapt his planning. But it comes with a cost, that should not be ignored.

Software Debt has high interest rates

We said earlier that working on a Product consists to identify the bundles that bring the highest value to your clients. When having to evaluate such value, it’s easy to prefer new business features over pure technical refactoring, as the value is more tangible. Underestimating value of refactoring is a common mistake.

As a developer, you are convinced that technical debt is a bad thing. So why are your stakeholders so eager to increase it, and reluctant to pay it back?

If you think in financial terms, debt is good. I mean, everyone agrees that it’s convenient to borrow money to buy a car right away, and pay it back in a few years, rather than having to wait all that time before you can have it. And that’s exactly how your managers see it: they want that project done now, and be able to pay back interests quitely later.

Unfortunately, this debt tends to be forgotten. Unless you are running a Central Bank, you cannot create money out of thin air and magically pay back that debt.

When it comes to software, technical debt has this particularity over financial debt: its interest rate grow over time. Ultimately, it can lead to a situation where your whole application is at stake. Just as if you decided to drive your car, but never bring it to the garage for a service. On the long run, you risk an accident as your tires are too worn out and your engine was not maintained properly. And then, fixing your car has much higher costs than a regular service would have.

The Kolkhoz Pattern

Kolkhoz, or Collective Farms, were huge farms under the Soviet Union in the 1930’s, that were owned by its workers, but controlled by the Government. Each kolkhoz was assigned production quota and type of crop by State officials in Moscow. Yet because of massive bureaucracy and the farmer’s resistance to impossible quotas imposed on them, it destroyed what was supposed to be a model of productivity and fairness for the population.

Why this history lesson? It is a strong illustration that centralized decisions are generally inefficient. And we care, because this apply as well for software development.

The Product approach requires that the development teams work on many small evolutions that are all bringing value to your clients. But the Top Management simply doesn’t have the time to track and validate so many small features. Top Management feels more comfortable with large projects that are meant to achieve strategic vision. If a small group of persons have the responsibility to decide on what should and should not be done on a product, and this group is not in permanent contact with the development team, you will end up having an extremely inefficient management of your product. You have created kolkhozes in your company.

The Product Team

You need to empower the teams that are actually working on your products. They are the ones doing the support for it, they know the best the hidden costs of the different features you want to implement. They know best what refactoring should be done, and what are the risks of not doing it.

Of course, you don’t want to lose the strategic vision of your product. This must be centralized somewhere. The model proposed by Allan Kelly in his blog post Beyond Project consists of teams that are able to take their own decisions. Such teams would typically be composed of multiple roles: a product owner that understands the business requirements and constraints, but also that understands the technical aspects. He is working in a team composed of developers, test engineers, UX and devOps. The team is multidisciplinary and has the autonomy to define specifications, implement and deploy them in a very short cycle. Using analytics, early adopters feedbacks or A/B testing, the team is able to draw fast conclusions about deployed features. Failure is accepted and the team is encouraged to explore.

On regular basis, the team reports to a board, typically the top management. It presents the work done, successes and failures. The team can submit recommendations to the board. On the other hand, the board communicates the strategic vision to the team. At each report, the size and mission of the team can be challenged, depending on its results and the strategic vision.

The product owners of each team meet on dedicated sessions where they synchronize their planning to match the board strategy, especially when it has impact across different products.

The same apply for the other team members : UX and designers meet and make sure that they have common understanding and tools when it comes to styling, UX specifications, etc. So does the developers that sit in a tech committee to have common development rules. QA and infrastructure need as well dedicated committees with a company vision for their domain.

This creates a two-dimensional matrix organization. On each row, there is a team empowered to take local decisions. A channel to report status is set up, so the management does not lose visibility on current work. Transversal communication guarantees consistency across products.

By bringing decisions ability as close as possible to where the real knowledge is, you create teams that can achieve the same efficiency you can see in start-ups. Except that these teams are embedded into your global organization.

You are now ready to be the next disruptive company in your business!

--

--