Hunting for toggle: Complicating simple features as much as we can

Ekaterina Guschina

Published in

hh.ru

8 min readAug 9, 2022

– Hey, Kate, there’s a little bug crawling, take a look, please.

– I’m on it, bro. What’s the issue?

– When I return to the screen, the toggle is reset to default. Easy peasy lemon squeezy.

That innocent phrase kickstarted my adventure into the world of mind-blowing architecture, insane fixes and red eyes. It was a trap.

Hello, my name is Kate, I’m an Android developer at SimbirSoft and I help improve the product at hh.ru. In this article I’m going to tell a story revealing how the developers of two companies, an Android tech lead and even the Head of Mobile were writing a minimal feature on MVI with toggle, and still they failed to catch a bug after long hours of projecting. Together we will find out what programmers sacrifice for the sake of good UX, why the initial decision was wrong, and how it was fixed.

This is a textual version of our vlog, so if you prefer visual content, welcome to our Youtube channel.

Finding the problem at the bottom of the sea

Since the screen was created by a different developer, the first thing I did was study it thoroughly. The screen had three states: loading, error and content.

Screen states: loading, error and content

The content included that very unfortunate toggle. When clicking on the toggle, a request to the server would be sent — we would deliver a new value of the configuration flag.

Important condition: we have to enable the user to switch this toggle as many times as necessary, not showing any preloaders.

At first I had only one little task: correct the process of saving the toggle’s state. A user can leave the screen before the request is finished (provided that the Internet is slow). That’s why the toggle’s state would constantly return to the initial meaning.

What did it look like? Dropping toggle when leaving the screen

What was the bug about? If the user switched the toggle when the Internet was slow, the following would happen. The user opens the screen, clicks on the toggle, switches it to ‘unchecked’, leaves the screen, but then comes back and sees that it’s ‘checked’. What a mess!

Screen development scheme

Our feature is basically a black box hiding the essence of the screen logic inside. It can access with the repository, which sends requests to the server. The repository also stores the cache.

As a result of the communication between our feature and repository, we get State. Initially it was divided in two fields. The first showed the cached state. It was coherent with the flag state on the server. The second field — UI state, responsible for what we display to the user.

Attention! All coding examples are just examples, don’t take them to prod!

In our case both states were interconnected, kept in one model, and downloaded when entering the screen. The only difference between them was the condition, according to which we would change the value. UI state changes all the time, as long as the user clicks on the toggle. However, the cached state only changes when the server replies to our request successfully. Since both of these meanings got overwritten in feature whereas we were starting the screen, we failed to recover the UI state and to show it to the user. That’s what the problem was about.

Ways of solving the problem

The first solution that came to our mind was to separate those two stages as much as possible. In such case we would proceed with updating the cached state when entering the screen, and UI state would be kept as long as it was needed.

However, the tests unveiled numerous bugs, because a user might switch toggles multiple times. Plus, we couldn’t guarantee that the requests would be completed sequentially. To sum up, the results could come in a surprising order. It was not right.

We tried to fix it by stopping the process of getting the results from the previous request, when we were already sending a new one. I offer you several ways of implementing such a pause.

Event bus

The first is to use an event bus. We send there an event about wanting to interrupt date processing, and then we catch this event inside the chain of our request processing. In this case the events will no longer be processed.

For this solution we used RX Subject and takeUntil operator. In this case, before making a new request, we send the events about interruption to Subject. Inside the chain of request processing we use takeUntil, an operator that interrupts the further Rx chain execution, if the source mentioned is sending any data. As a result, if we got events from Subject, the next steps won’t be executed. It’s a viable solution, we used it in our feature for pagination.

Storing the list version number

The idea of the method is the following: let’s introduce a new special AtomicInteger, which will store the global version of the list data, and the version’s local variable, which will be in accordance with the global after the request is finished.

When we carry out the list download again, we increment the global number of the data version in the list and memorize the updated number of this version. After having received the data to our request, we can compare the global version to the stored local one after the request, the results from which we received.

If the versions are equal, we can proceed with the results processing, if not — we just skip this phase.

Both ways have their own advantages. If we use the event bus, we won’t process the further data chain from the request. If we store the list version number, we will know for sure that a reload has happened. Plus, we’ll be aware of its sequence number. However, neither of those two ideas suited us.

So what’s wrong?

First, the incomprehensible nature of what’s happening. A person who has never faced the problem of ‘excessive requests’ or who sees such a screen for the first time in their life, might just freak our seeing such an implementation.

Secondly, keeping an additional flag in the repository (about the downloading process). An additional source of truth emerges, which needs to be coordinated properly.

Thirdly, storing cache in repository and in the feature. Again — several sources of truth. Sooner or later, it will lead to desynchronization.

Finally, excessive requests will be occurring anyway, no matter whether we use an event bus or store the list version number.

Error correction

As usual, the solution turned out to be much simpler. Let the feature manage its cache itself! That will allow us to get rid of the repository cache and leave it only in the feature. Consequently, we will have one and only source of truth about the screen data. So in this state implementation we now have three fields: Cached State, UI-State and Progress Flag.

An updated version of our screen implementation

Okay, the user has launched the screen and we have to display something for them. The data will be downloaded from the server only when the cache is empty or the information there is no longer valid. Data validation can be arranged as you like, we use cache validation by date.

When the user switches a toggle, we keep its value in State and check whether the request is being sent. If the request is absent, we send a new one with the toggle’s value.

On having completed the request successfully, we store the data in State (flag’s cached state). After that we need to check if UIState corresponds with the cached state. If they differ, we send a new request to the server.

Following this scheme, we benefit from two advantages: the cache remains consistent, and the number of requests is kept minimal. Stonks.

Implementation

That’s how we came to the general scheme. Now it is time we talked about implementation. First of all, we used MVICore library by Badoo. MVICore is a library for MVI pattern implementation on Android apps.

Let’s say, our user changed the meaning of the toggle, so we write down a new meaning in UIState. Having done that, we check whether we can send a request. You can do it in Actor.

Actor represents the overall logic of feature work. For example, it can send requests to the server or delegate event about State changes.

We now understand that we can send the request, so we do it. After the successful completion, we save the new value of the flag in State.

Having updated State, we can now track its change in PostProcessor. We go there and check UIState and cached State inside State. If they are not equal, we repeat the whole chain of checks-requests-updates again.

As a result of such work, our user can switch toggle with no worries, and we minimize the number of requests. That’s all.

Well, not really

Our scheme is not flawless. Soon after implementing it, we caught one more small bug: when the user logged out, we were still trying to send a request to the server, despite the fact that the unlogged user can’t really receive such a flag. In cached state the flag’s value disappeared, and the feature would start ‘knocking’ on the server’s door endlessly in order to get updates.

However, fixing it didn’t present any problem: we had to save the default value in the cached state.

Hunt managed!

Hallelujah, we fixed that bug, but at what price! A huge amount of time was spent, but that experience taught us a lot of things:

First, before starting a feature development, look at it closely and create an approximate scheme or plan of what to do. In this case we can, for example, project State correctly and avoid any other problems.
Secondly, to prevent our colleagues from wasting time on similar cases, we decided to create a special architectural ‘cookbook’ on MVI. The first case for it we’ve just explored with you here. When you have some confirmed solutions at hand, it’s much easier to avoid similar bugs.

Write in the comments, what do you think about the solution that we suggested and how would you solve this problem. We are planning to dive into more stories from our practice in the future. Buckle up!

Wishing you a productive and prosperous development!