What’s the problem here?

Part 2 — Understanding what is important to focus on

Peter Kerschbaumer
8 min readOct 18, 2017

In part 1 of my mini-series about the Toyota improvement Kata, I have been theoretical and explained the building blocks from a conceptional point of view. Now it is time to put these building blocks into action. I will use an initiative from a company where I recently worked as an example. The context is that we were developing software for the services of an online travel company. We identified our long release times as one of the most significant impediments for business agility. Hence we created an initiative to do something about that.

For quite some time we measured the performance of our development process. We used all sorts of metrics like local lead time, customer lead time and flow-efficiency and found out that our release process delayed our product development. Developers and other people with less technical knowledge were starting to talk about microservices and a lot of specific tools which undoubtedly would improve the situation. After all, we can read about it in many blogs or know somebody who knows someone who is working in a company where it all works perfectly. In short, the people were diving directly into solution mode based on anecdotes and opinions without even describing the problem adequately.

“If I were given one hour to save the world, I would spend 59 minutes defining the problem and one minute solving it.”, Albert Einstein

Ok, a quote from Albert Einstein. In reality, it might look a bit different, but you get the point. You need to spend some time on the problem description. Very often you will find out that your problem is in a different corner than you thought. We started out by trying to get as much quantitative information as we could, but we were lazy. We looked at the available data (lead time, flow efficiency) and tried to interpret it. We found out that we lose 50% of our flow efficiency between the point where development finished, and customers could use the new functionality. In this thought model, the release management was treated as a black hole and as such a complete waste of time. Mapping out the details of the release process was considered a job for CSI, meaning too much work. People wanted to go on and find solutions. So we identified high-level parts of the process, and manual sanity testing (regression testing) was one of them.

Quickly the manual testing was identified as the guilty party. Our first problem definition looked like this:

We lose 50% of our flow efficiency because of lack of test automation.

Case closed! Criminal convicted… Really? In reality, test automation is a no-brainer. All literature about continuous deployment describe automation as one of the primary preconditions for technical agility. The problem is selling it to decision makers as a necessary investment. After all, who has seen a company that does all this right from the beginning? In most businesses, it is something that needs to be invested in after some time and millions of lines of code. Quickly a business case was created which gave the buy-in from management. The company made an effort and automated many tests and established policies to continue doing it. I guess, I need to write about that too at some point. Back to the problem. What happened to our flow efficiency? We improved nothing. Test automation gave minor gains in speed in a minor part of the whole process at a very high cost. Of course, we improved quality, but we did not see quality as the problem before. We saw lost efficiency as the problem. Back to the drawing board.

From all the anecdotal evidence we have found we came to the conclusion that the way we release software is the problem. That changed the problem statement a bit:

Through our release process we lose 50% of our flow efficiency by adding unnecessary waiting time to our lead time.

So the (other) solution would be hunting time waste, and we would solve the problem… at least we thought we would, and again we were happy that we found the guilty one and we were sure to have found a quick solution to improve our situation. This time we did not need to convince management to invest a lot in staff-hours. We would do it with no footprint on our current budget and no additional cost of development capacity. That was the moment when I could convince one of our project managers to try out the Improvement Kata.

A different kind of project management

Sergi. He is a project manager. He is an excellent project manager. The PMO assigned him to the project of “improving the flow efficiency and lead time of our development pipeline”. If you are the average agile coach or scrum master… now is the time to start doing your jazzy hand wave and say “Nooo, we cannot do it like this!” and have #noprojects written all over your face. If you are a professional project manager you probably already see the pain in front of you. The pain of complete uncertainty.

Sergi was no different. He is an outstanding project manager. That means in my world; Sergi is an open, empathetic person who creates the environment and preconditions for people to do their best. In other words, he is more of a servant leader than a manager. Nevertheless, Sergi faced a very fuzzy challenge. He had the vague idea that he needed to do things that improve the lead time of our release process. So where is the project description or the requirement specification and the team he would be working with? As the leader of the initiative, I had bad news for Sergi.

“We don’t have a clear project description, and we don’t have a team to work with.”, I told him, and again, I was expecting jazzy hands ups in the air. However, Sergi just stayed calm. He knew that I am the agile coach and as such, I can do magic. Everybody knows that we agile coaches rush early from work because we have to feed our unicorns and practice magic spells at home.

“So what are we going to do? How are we tackling this challenge?”, Sergi asked during our first kickstart session. The only answer I honestly could give him was…

“I have no idea what we need to do to get there. I just have an idea about the way to get there.”

He patiently listened to my explanation of the Improvement Kata. I gave him some links and background material to look at, and he came back after a couple of days full of enthusiasm to start working on the project. Later the word project was not valid for him anymore, and he started talking about the improvement initiative.

So far we have defined our problem. At least so far we thought that this is the more significant problem we need to look at. You have seen that the problem definition needs work too. You should not neglect it by rushing over it. Having a precise, explicit definition is of great value for the continuous improvement you want to achieve. Let’s have a look at how to create the necessary focus to start the actual improvement.

Focusing on the right thing

Sergi and I started out by looking at the Focus Area first. Defining the whole release process in general as a focus area did not seem very focused though. After all, we are almost 50 teams with an interdependent complex code base. Sergi narrowed down the focus to an area where we felt that we could achieve an improvement rather quickly and where we safely could test the Improvement Kata. We selected a module with no dependencies on other modules or teams, and thus we could work out the following definition for the Focus Area (or Focus Process):

Release time and frequency for the CatalunyaAir module
(of course, CatalunyaAir module had a different name but confidentiality…)

For me that sounded good at the time. That short sentence described very clear where we would put our efforts to improve things. So we have our “where”. It sounds simple to get there, but we needed to put some effort and coaching sessions into getting it right. Questioning previous assumptions on a regular basis is part of the process, and you should not skip it. It helps you to improve your improvement.

Defining the challenge

The first attempt at establishing our challenge was as clumsy as the first definition of the focus area. After some quick discussions we decided that this would be our challenge:

Increase release frequency and decrease release time

Again, this sounded like the right thing to do. Only implement some changes to our way of working, and we would gain a great deal of efficiency. Initially, the results were encouraging, but soon it felt like we are not working on the root of the problem. It felt more like we were optimising and reinforcing something that kept us from evolving towards the goal of our challenge. Sergi explained his dilemma and the feeling about not progressing right.

“We tightening up the processes, but I have a bad feeling.”, Sergi said and continued… “you know, historically we built walls around the production systems to make them safe, and now we are improving the walls.”

We started discussing how things are done and had a look at the documentation of the current processes. Documentation sounds formal because, in reality, it was the result of an investigation that resulted in a long string of connected post-it notes describing all the details of the release process. Coaching time…

“How would this picture look like in the future if it were different?”, I asked him desperately trying to not sound to coachy (is that even an adjective?).

He thought for a minute and reflected over his notes so far. “I think in the future the system should be able to handle changes and even recover from real failures quickly.”

“What would need to happen to get there?” Damn, another coaching question. I need to get my act together.

“Automation!”, Sergi said. He showed me his notes and the results of the investigation we have been doing around the manual processes. He argued that automation would not only make the process faster but also safer.

“Sounds like a way forward. Is there anything else that would be different?”. I had an idea about it but wanted to hear it from him.

“I have seen that the build process takes a long time by itself because the modules are so big.”, Sergi said, and he continued talking about how difficult it is to do anything with our current architecture.

“I think, the architecture part is improved by an ongoing modularisation initiative…”. I suggested Sergi that it might be a good idea to connect to people in the architecture team. Our discussion continued for a while, and in the end, Sergi formulated the following statement for the challenge we are meeting:

Provide a safe environment without human intervention.

That sounded definitively like a challenge that contained the factors which would positively affect the system. Having a safe environment means in practice that a development team could release things without being afraid of breaking the whole system or having a negative effect on the business outcome if things go wrong. Without human intervention means just that we would automate it all, tests, builds, monitoring roll-backs and so on. In the next article, I describe how we came to understand where we are, where we want to go and how to get there with regards to our challenge.

Stay tuned…

--

--

Peter Kerschbaumer

I’m passionate about helping modern organisations to navigate through the ever more rapidly changing world. …and paragliding :)