Modernising A Legacy Android App Architecture, Part Three: Applying The Refactor

Published in

BBC Product & Technology

19 min readDec 7, 2020

Background

Hello. I’m Rob and I work on the Android version of BBC Sport. This article is the third in a sequence of pieces, talking about a refactor of our app.

Obligatory disclaimer: views expressed are my own, not the BBC’s or the team’s.

The first article is here and it introduced the phase of ‘rewrite’ that we’ve been doing, and then went on to look at moving to a single activity model made up of multiple fragments. The second article is here and discussed our version of MVVM along with some illustrative implementation detail.

Now we’re going to look at the overall phase of applying both of those techniques across the app as a whole.

We’ll examine the experiences we had with:

applying ‘MVVM+C’
migrating to component fragments
integrating these fragments into a hierarchy
navigation-related choices and difficulties

and then wrap up with some thoughts on the whole process.

Applying MVVM+C

Let’s talk about this first. Refactoring from either a legacy design pattern (MVP, MVI) implementation, or from no design pattern (everything in the fragment/activity), was the simplest part in this overall puzzle.

This is in large part because the scope of any of these refactors is restricted to the internals of that feature and generally doesn’t have impact on anything outside. From there, it’s just repeated application of the pattern.

Prior ‘foundation’ works like getting rid of our use of Koin paved the way for much simpler, centralised dependency access than we might have had. Not having to refer to platform objects like an Android context all the time allows for much cleaner possibilities.

Despite the relative simplicity, there were a handful of interesting facets that came up.

Understanding And Reconciling Legacy Patterns

We’re applying this to something that already exists, so it’s mostly a question of reallocating and migrating responsibility. As mentioned previously, we had a complex mix of legacy classes. Presenters (MVP) and Interactors (MVI) featured sometimes, but other ideas were present too, like ‘Coordinators’ and ‘UseCases’ and ‘Contracts’.

UseCases seem to have been an abstraction above the layer of presentation pattern choice. This one was actually implemented by a ViewModel:

Another of these was implemented by an Interactor. Given that we’d determined over time that the Android ViewModel is an architecturally benign choice that we have confidence in, this is one level of separation we can do without, so it was collapsed into the viewmodel.

We also had things like this:

At a glance, this looks like the basis for half-decent separation or at least definition of responsibility, but the ‘View’ here is actually implemented by a Fragment. Furthermore, ‘setModel’ is a chunky responsibility that makes the view non-stupid, and ‘completeOnboarding’ contains outright business logic.

We untangle the view from its fragment container, and we wrap it as described previously. We use the view controller to translate the model into finer-grained operations on the view, and to hold our business logic. The responsibilities that were in the Presenter get brought into the view controller.

The idea of these ‘contract’ interfaces also dies with this refactor. In the legacy code the interface has value because it’s the only firm definition we have of the relationship between view & model, the only way of not just divorcing the two concrete implementations but also keeping the platform out of it. Our reworked pattern of MVVM+C is all about making those separations an ordinary, implicit part of the landscape through distinct component responsibilities. In particular, the controller is a strong enough embodiment of the link between the view & model that we don’t really need to call out what it does in interfaces.

Sometimes There’s No ‘View’

The first V in MVVM+C is ‘view’. Remember also that we said we would wrap this view. There’s always a view in Android — we always instantiate a layout, even if it’s static. But when it is static, the view wrapper has nothing to do.

We had a fair few instances of this. Sometimes all the UI interactions are just swapping in/out of child fragments. Does this mess up our paradigm? Is it a problem to have a view controller with no view as such?

It’s perhaps tempting to say that the pattern doesn’t work here, but when we think about fundamental responsibilities, the requirement is the same, it’s just implemented slightly differently.

Here’s an example:

This view controller is created by the fragment and instead of a view wrapper, it’s given a handful of callbacks. When those callbacks are invoked, the fragment performs a fragment transaction — keeping the platform-specific implementation part isolated in the platform-specific component where it belongs. Conceptually this is still absolutely a ‘view controller’.

Shared Viewmodels

Despite the above, most of our features are implemented with a quartet of components: a fragment, a view wrapper, a view controller, a viewmodel. The viewmodel is usually specific to that fragment and instantiated on a per-fragment basis.

It doesn’t have to always work like that. Here’s our soon to be updated ‘onboarding’ process, which takes the user through a series of screens where they pick a sport, then a team, and on completing the process the app sets them up with notification subscriptions.

In this, we share a viewmodel across all the component fragments, and it can accrue and maintain state as the user goes on that overall journey.

This is achieved by all the components asking for an activity-scoped viewmodel rather than a fragment one. There are definitely ways to tighten this up and share one scoped to the parent OnboardingContainerFragment instead, and there are some efficiency reasons to do so, but what we’ve currently got is a compromise that doesn’t introduce more confusing possibilities into the viewmodel fetching mechanisms.

Getting Towards A Single Activity — Fragments First

This element of the refactor was much more tricky, because it involves lots of moving parts, and some of them are interconnected.

As a refresher, we started with something like this, where the purple elements are activities:

It’s a lot of stuff to tackle.

Being able to turn this into a single activity is the end result, not a starting point. You can certainly create a new activity and start building your hierarchy within it, but you won’t be able to move anything into it at first.

Instead, you start small. Smallest for us was the splash screen. This is the splash screen. It’s just a big yellow screen!

This actually doesn’t even have a viewmodel or a controller or a view wrapper. Some example this is! It’s just a fragment with a layout, but a fragment with a layout is better than an activity with a layout. And its predecessor SplashActivity had a load of navigation logic in it, which this does not need at all.

Having survived that enormously difficult implementation, we could turn our hand to something with some actual behaviour.

This is the screen that we show when the app is disabled; perhaps when we switch off an old version, or maybe the whole thing comes to end for some reason.

This is the same screen, same fragment, with some dynamic elements. What it actually shows is contingent on the bootstrap status (in the above, ‘force upgrade’ vs. ‘killed’), and the message text in the remote configuration file.

From this UI’s perspective, that information comes from the viewmodel. Based on that information, the controller actions whether to show or hide the button, and it sets all the text on the view.

The controller also subscribes to the view’s button events, and if the user clicks it, a callback that the fragment provided is invoked, and it launches the platform-specific responsibility of opening the Play Store.

This isn’t exactly rocket science, but it was a good introduction to MVVM+C, and we could then use our experience to build a few more things, like a separate error screen when the remote config can’t be fetched.

That gives us a handful of set pieces but we need to integrate them together.

Integrating The First Layer Of Fragments

Suppose we’ve now got the following:

a ‘splash screen’ fragment
a ‘disabled’ fragment
an ‘error’ fragment for when the config can’t be fetched
an empty placeholder fragment for content that we will populate in the next layer

This is actually the first layer of this diagram, from the first article:

(we didn’t include ‘error’ in that diagram, and our ‘AppEnabledFragment’ is going to be the empty placeholder for now)

We can create what was due to be MainFragment now, although it turns out we actually called it StartUpFragment instead. Here it is in slightly simplified form:

We use an activity-scoped viewmodel because this fragment is always going to be around, and should react to changes. We use a special viewmodel factory because unlike all our normal UIs, we are operating in an environment where the remote config may not have been fetched yet, and we may not have started all our dependency services yet.

The view controller subscribes to the viewmodel, which provides status updates, and it invokes the relevant fragment swap as necessary. Aside from creating components, the fragment itself doesn’t contain any business logic; like a view, it just does what it’s told.

If you’re interested in lifecycle, note that we call an onRemove method on the controller in onDestroyView — this causes the viewmodel to unsubscribe from core services (which live for the app object’s duration) so that all the UI stuff can be torn down as necessary.

We’ve integrated our component parts into a first layer, and we can think about working on the content placeholder to build the second layer. It’s a repeat of this process: prepare fragment components and then integrate.

The Big Picture, And Headaches Along The Way

This is what we ended up tackling, from roughly August to December 2020, with lots of other work ongoing too.

Everything in green got MVVM+C applied to it, and everything in blue/purple was integrated into that largely unchanged. The features in yellow were adapted to suit a single activity.

It went pretty smoothly for the most part, but it wasn’t all plain sailing.

The main problems we encountered were all navigation related, and roughly these:

Replacing the legacy ‘pile of activities’ navigation stack with something appropriate
WebView specifics relating to the above
Our own tangled navigation responsibility principles

We’ll look at each in turn.

The Activity Navigation Stack, And Fragment Replace vs. Add

Suppose you have multiple Activities, and you go on this journey:

Each new activity gets added to the Android activity stack. When you press ‘back’, the top one is wiped off the stack and — most of the time — you are left immediately looking at the previous content.

Example of forward and backward activity-based navigation

This is because everything is retained in memory. However, in this scenario, Android is able to garbage collect and recreate the underlying activities if it deems necessary. When we try this out in the app, we don’t see this tear-down happen very often on the most popular mid-to-high-end devices, even with prolonged journeys, but you can be sure it does happen.

When we go to the single activity model, we have a dilemma about how to handle this. Normally we replace fragments, but when we do that, they’re destroyed and recreated. The consequences of this vary depending on how well designed each fragment is, and what it contains. More on that later.

We could replicate the activity stack by adding fragments instead, so that when you navigate backwards, the prior content is immediately present. You can also call ‘hide’ on the content that will end up underneath. In and of itself, it works. I spent quite a lot of time trying out this approach, which isn’t very well documented.

The problems I experienced with this were:

The underlying fragments are still in memory, and unlike activities, cannot be garbage collected. It was still performant in testing no matter how many fragments I threw on the stack, but I always had significant doubts about how this would perform in the wild.
They appear in the view hierarchy — relevant at the very least to the Layout Inspector and automated test view finding
If the underlying fragment does something late, like asynchronously populating the (shared) ActionBar, you get really horrible interactions between fragments that you have to try and manage
There are no conventional lifecycle events as fragments get hidden — you get onHiddenChanged() but not onPause, onResume etc because it is simply not paused or resumed unless the whole parent is — it’s still there. This ‘hidden’ property is unusual and it’s not at all good that we would have to consider it in so many normal things we do.

The results of adding fragments rather than replacing — ‘snackbar_container’ is basically our content fragment and there are three instances of it here

These consequences combine to make add-on-top to be a non-starter really. We’ve no choice but to use ‘replace’, and deal with the issues.

Back Navigation in MVVM Fragments With Native Rendering

When we did this on Sounds, everything in the app was a native component. Some of the content on Sport is natively rendered, and we’re adding more capability all the time. However some of it is rendered in a WebView. At the time of writing, the main page is one of those things, and so are a subset of our articles.

Web-based and native articles respectively. And Eric Dier putting us all out of a job.

At least if you build your stuff right, when you swap out a fragment based around a native Android view - for the sake of argument, one that involves a RecyclerView - the platform lends you some support and actually does a lot of work for you. Despite having worked on Android for years, I hadn’t fully realised all of this until I started exploring it in detail.

When you navigate backwards, then provided that you put your fragment on the back stack, it gets revived where it left off. The actual fragment instance gets newly created, as does the view and the view controller. However the viewmodel is summoned from the factory, and when it was created the first time, it was given a fragment scope — this fragment. It’s been kept alive and it’s retrieved instead of freshly created. This means all your viewmodel data is present, and you can just reapply it to the view. That’s the helping hand of the Android architecture classes at work.

The scroll position of the RecyclerView is also maintained, conditionally — that is, as long as you (re)populate it before the first layout pass. I still don’t know exactly what magic this uses, but I’m happy it does it. We had to make some tweaks to some of our components to handle an existing viewmodel, rather than always freshly fetching content, but once we’d done that, this worked well.

All in all, despite recreating everything, the user is put back roughly where they were, in fairly short order.

Where this all goes a bit wrong is once we introduce the WebView.

Back Navigation in MVVM Fragments With WebView Rendering

The WebView is effectively a black box. Unlike our viewmodel-powered components, we don’t own its state or data at all. By default, navigating back to one gives you an empty WebView.

We can intercept its network requests and we can control caching but that’s about it, and all caching achieves is speeding up a reload attempt. The user’s context is still lost.

There is some good news. WebView has a couple of methods: saveState and restoreState . The former writes to a Bundle and the latter reads it back. But writes what?

This is what the documentation for restoreState has to say:

Restores the state of this WebView from the given Bundle. This method is intended for use in Activity.onRestoreInstanceState(Bundle) and should be called to restore the state of this WebView. If it is called after this WebView has had a chance to build state (load pages, create a back/forward list, etc.) there may be undesirable side-effects. Please note that this method no longer restores the display data for this WebView.

The last bit is interesting — it seems that it used to persist the page but there were problems with apps cleaning up this state, along with I suspect other complexity and reliability issues.

It’s not clear or documented then what this actually does do now. It seems to me that although it doesn’t retain the full rendering, it should at least restore the current navigation state (i.e. the URL) and things like the scroll position. There are known things it doesn’t do, like re-apply an HTTP POST request, but for us and our usage that’s acceptable.

I implemented a use of this capability. We generate a UUID for the fragment when creating it, and when pausing it, we save the WebView state to an in-memory store in one of our services. When the fragment is recreated it pulls it back using its UUID. ‘State’ in this case is actually a data object containing the WebView-sourced Bundle, and the Y-axis scroll position (scrollY) of the WebView, because it turns out we need to explicitly know that.

On a sunny day with the wind at your back, this performs OK. However, once you start poking at it further, the scroll position starts giving you really difficult issues. Even when we restore the (correct) scroll position that we captured, the WebView restoration process often overwrites it with inappropriate values as it figures out its rendering, and not just the once either. We can choose to fight the system and reapply the position, but there are consequences.

If we don’t tackle this at all, we often restore the user to the wrong position. However if we surface this to the user, they get visual oddities like a page that repeatedly jumps about for a while. If we hide the WebView until we’re sure we’re ready, the start-to-definitely-finished loading process is quite long — several seconds — and it interrupts the user experience. It’s very tricky to determine when is the best moment to present the user with something.

A 4x slowed down example of going back to a WebView and restoring scroll. On this occasion there is annoying flicker as it loads to an undesirable mid-page position then we re-apply our correction to the right position.

I really hoped there would be some observable event that at least correlated very strongly with not getting any more unwanted scroll events, but alas it would appear not. Some techniques I tried, amongst others:

Observing WebViewClient.onProgressChanged()
Completely suppressing scroll events by overriding WebView methods
Stopping scroll events before save and after restore
Waiting for WebViewClient.contentHeight to be non-zero
Responding in various hooks like WebView.onPageFinished()

This is one of those really frustrating problems where you can only figure out the behaviour through experimentation and observation, and as a result, this was a big time sink.

If there is a clean answer, I didn’t find it. In the end I made the WebView invisible (not ‘gone’) whilst restoring state, and made it visible again once contentHeight was non-zero, driven by progress events. I also delayed that visibility change by a small, fixed duration to improve the chances of it being ready. Even after it’s made visible, we continue to apply our desired scroll position whenever scroll position changes, i.e. every time something calls onScrollChange until the user actually touches the WebView.

This is not perfect — sometimes there are still corrections after making it visible, and in the cases where this all worked well, we keep the view hidden for longer than necessary. However it feels like an OK compromise overall. It’s also further impetus to get away from WebViews in favour of native!

Cleaning Up Navigation Responsibility

The other headache I had in the course of this was of our own making. When we had lots of activities, and some of the key content pieces within them were fragments, it was easy to implement inconsistent navigation experiences.

In the app as it was, you start at the main page with tabs and a menu. You can navigate into a clear ‘story context’ (no tabs, no menu) and you can perform further navigation, for example to the topic of Tennis.

However, there were other ways of arriving at the same destination. If you went to the ‘All Sport’ tab and found Tennis from there, you got the same content but presented in-situ, so in the main page’s container:

This meant that there was an ambiguous parent responsibility for any particular type of content, depending on how you got there.

This is an engineering-led refactor and we don’t really want to change the user experience if we don’t have to. In this case these existing behaviours were a real obstacle to producing something good inside a coherent fragment hierarchy, and not necessarily the result of explicit design choices to begin with. Fortunately, our UX designers and our product management were very accommodating, and we were able to rationalise this so that only top-level content (the tabs) were presented in the main page form, and everything else was presented in the story context.

This may seem like a minor detail but it influences a great deal: how deep link journeys are constructed, what happens if you click an in-article link back to the main page, and so on.

Conclusions

It’s the end of this three article saga, and here’s a recap.

We talked about why we’re refactoring the Sport app in the first place
We talked about the merits of the single activity over a tangled web of activities
We designed a single activity fragment hierarchy that properly assigned responsibility to different, specific components
We chose ‘MVVM+C’ — model/view/viewmodel with a controller thrown in for good measure — as our design pattern
We chose and justified a low-tech, traditionalist approach for event publishing, the humble listener, and said no thanks to RxJava and (for now) LiveData.
We applied these techniques in an incremental way to much of our app
We hit a few snags, like dealing with WebViews, but we just about got there in the end

Did We Meet Our Goals?

This isn’t one of those refactors where we get to roll something out to the audience to great fanfare, or feel really smart for having deployed a shiny new technology. As far as the delivered product goes, it should be business as usual, and ideally no one outside the dev team will really even notice what we’ve done, at least directly. If I’m honest, given the scale of the work, it’s probably a little underwhelming as it concludes.

We as developers should notice though, because it should be far easier to read, understand, fix, extend and modify some of our most important code. It should be far easier to articulate how it works to other people, teach software engineering using it, and bring new starters up to speed. We should be much more able to talk to other BBC teams about what we’re doing and how we’re doing it, because we should have a far greater degree of commonality, and without getting too smug or evangelical, we should be able to use our work to communicate best practice for others to follow.

From the first article, these were our stated aims:

consistency with modern Android development practices
consistency with other modern BBC app architectures
clarity for developers — being able to look at the codebase and see quickly that this is the pattern for how we build features
usual product qualities: maintainability, reliability, extensibility

I think the first three have been objectively achieved, at least within the scope we tackled. The fourth requires the passage of time to demonstrate its merit in the specific case of Sport. However again we can refer back to our experiences on Sounds where we found over several years that the first three contributed to a highly reliable product and a codebase where we could very rapidly build new features (and there were many) by using existing patterns and precedent.

Any Lessons Learnt?

I’m mostly happy with how this went. We were able to re-apply a successful architectural experience from one engineering development to another, different one.

Our chosen scope was a good one, and the strategy of performing an iterative, increasingly complex application of MVVM+C on a component-by-component basis paid dividends too. The components that we left outside of the scope are, it turns out, mostly fit to remain there for now.

We ran this development behind a feature flag for a long time, with the fragment hierarchy in a ‘new activity’ before we tore down the old one. Later in the development this had to end, because it felt like unnecessary effort to keep old, bad code working rather than migrating it, and we were plagued by the clutter of things that couldn’t be deleted. The choices we made in that balancing act felt about right, and it allowed non-refactor development to continue unhindered for as long as reasonable.

There were less organised moments towards the end, particularly when the tangled navigation complexity unexpectedly brought the ‘Main’ and ‘Story’ components into play simultaneously, rather than sequentially. The nature of the ‘single activity’ rework is, within the chosen scope, pretty much all-or-nothing, and if you can’t complete it, then you can’t ship. At that point there is quite a lot up in the air. The WebView problems added pain to that — what if there’s not a good solution?

I think if I were to do this again, I would try and better identify the risks specific to this application up front, things like WebView navigation and the interconnected nature of some apparently unrelated things. It’s easy to say this in retrospect, so I’m not sure there’s any guarantee that we would find them, and I’m certainly not a proponent of waterfall approaches. However if we’d been successful in that, then we might have been able to carry out more preparatory, foundational set pieces to reduce the amount of in-flight risk at any one time.

I also think this was a very difficult piece of work to communicate to anyone who isn’t an engineer. Hopefully this article series has done an acceptable job for a technical audience, and it builds on some of the internal explanations of this that we put together, but I think it would have been helpful to share this story earlier. It was never going to be the easiest thing to articulate to non-technical stakeholders; when we’re done, the product will be… the same? This has required a great deal of trust in us, which we absolutely received. I wish we had found ways to better reward that and show value beyond mere progress updates and the ‘wait and see how this improves things long term’ promise.

Has It Yielded Any Benefit Yet? What’s Next?

There has been a high level of product ambition for some time on Sport. We were hamstrung before on Android by our legacy codebase and many of those problems, obstacles and slowdowns have been addressed in this phase.

In the middle of applying these changes, we redesigned our ‘onboarding’ user experience, having already applied this refactor to the old version. The malleability and coherence of the reworked code has, in my opinion, greatly accelerated the work and produced a better quality of new development. Whereas before we would typically end up bolting things on and therefore adding more tech debt when extending something, we’ve been able to keep it clean.

We still have a lot of work to do — this entire refactor is one facet of our codebase — but we’ve made a big impact that should unlock new development possibilities.

No doubt there will be more to write about before long, so watch this space. Also, questions, comments and critical feedback are all extremely welcome. Please leave a note or get in touch.

Thanks for reading!