Problem solving techniques
How do we solve problems? The answer is obvious, we just take one and solve it, right? Yeah, but how do we do it? Talking about natural things we are doing every day is always similar to “but how do you walk”. It’s worth knowing the mechanics of problem-solving to be able to become better at it. The same thing is with walking, but I will leave that for another time.
There are situations where I find myself in the middle of an ongoing production incident or situations in my daily work where I feel stuck and do not know how to solve the problem I am looking at. Here I’ll talk about several strategies helping me to find the solution. Let’s illustrate this with an example story.
***
I saw slight panic grow in her eyes with every action she took. “How bad is it?” I asked. “We are down. The service is not working …” She replied as she vigorously typed console commands into the keyboard. Okay, now I know. One of those mornings of utter chaos where everyone is running in circles screaming obvious things at each other: “Do you know we are down for an hour?”
“What are you doing now?” I asked her, trying to figure out where we are in solving this incident at the moment. “I am trying to increase the amount of memory in message broker server” she replied as she hit the keyboard a little harder, “It’s not helping, whole memory will be full in a few minutes again”. The symptom was that the queues in message broker were filling up quickly and the whole system was stuck as there was no ability to consume everything fast and incoming messages were lost.
At a first glance, the solution was to increase queue sizes and the amount of allocated memory in the broker to a certain point where a larger payload would fit, but apparently, that did not help. Everyone was pretty sure they had not touched anything related to broker communications for ages. That was why everyone was stuck — we could easily roll back, but what to roll back and to what point?
This is an indicator: “We are stuck”.
It shows the need to stop and analyze the situation. It took some time to do a few iterations of “why” questions: why memory is full, why are there so many messages in the broker, how do they get in there, is it really this service that is causing problems? And that led us to the root cause — it was not the increased payload size, it was yet another service that was changed a few hours ago and was sending messages in a crash loop. Once we figured that out, the solution itself was trivial.
***
So what is the algorithm?
The very first step is to put panic aside.
Fear is the biggest showstopper in every situation. The unknown is the only thing we are afraid of. Instead of being afraid, the analysis needs to be done. This helps immensely.
Then there are two ways to approach any problem:
Top-down and bottom-up are both strategies of information processing and knowledge ordering, used in a variety of fields including software, humanistic and scientific theories (see systemics), and management and organization. In practice, they can be seen as a style of thinking, teaching, or leadership.
(citation from Top-down and bottom-up design, Wikipedia)
Top-down strategy
A top-down approach (also known as stepwise design and stepwise refinement and in some cases used as a synonym of decomposition) is essentially the breaking down of a system to gain insight into its compositional sub-systems in a reverse engineering fashion. In a top-down approach an overview of the system is formulated, specifying, but not detailing, any first-level subsystems. Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, until the entire specification is reduced to base elements. A top-down model is often specified with the assistance of “black boxes”, which makes it easier to manipulate. However, black boxes may fail to clarify elementary mechanisms or be detailed enough to realistically validate the model. Top down approach starts with the big picture. It breaks down from there into smaller segments.
(citation from Top-down and bottom-up design, Wikipedia)
Top-down approach fits very well when you have identified a problem but do not have enough details to find an appropriate solution for it. In other words: you take the whole problem and divide it into several parts. If those parts are too big yet to get the solution — repeat the first step.
Bottom-up strategy
A bottom-up approach is the piecing together of systems to give rise to more complex systems, thus making the original systems sub-systems of the emergent system. Bottom-up processing is a type of information processing based on incoming data from the environment to form a perception. From a cognitive psychology perspective, information enters the eyes in one direction (sensory input, or the “bottom”), and is then turned into an image by the brain that can be interpreted and recognized as a perception (output that is “built up” from processing to final cognition). In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form a larger subsystems, which then, in turn, are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a “seed” model, by which the beginnings are small but eventually grow in complexity and completeness. However, “organic strategies” may result in a tangle of elements and subsystems, developed in isolation and subject to local optimization as opposed to meeting a global purpose.
(citation from Top-down and bottom-up design, Wikipedia)
Rephrasing bottom-up: if you have lots of separate details and do not have a clear picture of the problem — you need to group them together into something more generic and continue doing it until you connect everything into the whole single.
These two approaches can be combined for solving difficult situations especially when you have lots of detail and want to verify that you are about to combine them in a right way:
***
“Here is a new task that you and two of your colleagues will be working on from now on. We expect results from you after a few months.” a manager told me once while we were sitting in a small conference room.
The task was to analyze the core we were working on (which was there for ages), identify and expose an API on it thus helping to break the whole organization-wide monolith into smaller pieces. There was a team that had been working on this task for two years, yet, they seemed to be stuck with that.
As a department, we owned hundreds of programs that were interconnected in this core. It was clear that we would be starting from scratch and the time we were given for this task was, well, time was not on our side. After an initial shock passed, we regrouped and began two main parallel activities: analyzing what programs do we have, what their interfaces are, and the second one — to find out what exactly is needed for our direct consumers.
Meanwhile, we were attending status meetings for this organization-wide activity, looking with trepidation at hundreds of API endpoints that other departments were presenting, thinking to ourselves “how are we going to get this done in time, is this even possible?”. Yet, continuing our analysis at some point we started to see an imbalance in what others were expecting from our core and our effort to expose every existing functionality we had.
We started to ask ourselves “why” questions and found out that we were fighting the wrong fight: it was about competing with other departments’ presentations, instead of exposing the right capabilities in our domain. From that perspective, it was obvious that we need to select which endpoints are needed, and designing that in the given time frame, yes, it was possible.
***
Beside analytical strategy there are some other useful tools:
Five why’s method helps you with breaking down and nailing the problem
an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem.
The primary goal of the technique is to determine the root cause of a defect or problem by repeating the question “Why?”.
Each answer forms the basis of the next question.
(citation from Five whys, Wikipedia)
Issue log and RAID log
Both of them ensure you stay organized, also you can find and change priorities fast, and nothing is lost along the way.
Personally I use simple table for an issue log in order to keep everything in place, to keep track of how it was going, sometimes I’m adding work notes into the same table and some links to Issue tracking tickets if there are any. Issue log does not replaces issue tracking system, it’s more like extra layer on top. When this simple perspective is not enough I convert issue log to a RAID log and get more perspectives.
Conclusion
With these several simple but very powerful techniques and really simple ways to stay organized, you can tackle every big and small problem. And last, but not least: consciousness in your daily work — it is important to understand why and what you are doing.