Knockout Knockout — Mark43’s Journey to Fully Rewrite the Front End

David Vargas
Mark43 Engineering

--

Large scale refactors are daunting. They could weigh on a team for years with no apparent end in sight. However, when you do get there, it’s an incredibly rewarding feeling for the entire engineering team, especially when it’s as impactful as this one. My hope is that you could take a few lessons from Mark43's large scale refactor to help yours go a little bit smoother.

Backstory

Mark43 builds software that empowers governments and communities to improve public safety. This suite of software includes a Records Management System (RMS) that allows officers to record and find data related to incidents in the field as quickly and as accurately as possible. This results in having to build a front end heavily reliant on form data. Not only capturing form data, but being able to observe form data and have various UI components react based on the values of said form data. This includes validation, hiding/showing fields, autofilling values, etc.

An example input from the RMS

Back in Mark43’s infancy in 2015, we used a frontend framework called Backbone for defining “views” and another called Knockout for binding and observing state to our HTML layouts. At the time, both of these frameworks were among the most popular for building UIs. However, some problematic patterns began to emerge. Given that we were effectively using two frameworks to define our UI, it became increasingly unclear where the division of responsibility was between the two. This resulted in custom patterns emerging that no one else used and were hard to maintain. The imperative nature of Knockout meant that all state changes required associated UI changes and forgetting to do so led to user facing bugs. Most problematic of all, community support was beginning to wane as the javascript ecosystem was turning to a new framework that was on the rise during this time: React.

It was becoming increasingly clear that React was becoming an industry standard due to its declarative paradigm of building UIs. We had confidence that React could address the three problems mentioned above: it could establish patterns for defining views, manage the component lifecycle for us, and came with the added benefit of open source support that was accessible and continuing to grow. We began to write new admin pages in our app in React and started to see some of these immediate benefits. It was then decided with little to no push back — React will become our official frontend framework of choice. What ensued was a 5 year refactor journey the likes of which the engineers of this company could not be more excited is over and hope to never experience again. So, why did it take so long? There were many hurdles, but I’d like to highlight the biggest three.

Feature Factory

Like any young and fast-growing startup, we felt pressure to pump out features as fast as we could so that we could onboard new clients. The public safety industry in particular is one where workflows have been developed independent of one another for years and vary widely between departments. This often resulted in each new client bringing a checklist of feature requests to the engineering team so that the launch would be successful and seamless. Consequently, we would be motivated to continue adding logic to the legacy framework as the path of least resistance over rewriting core pieces of our app to deliver on these feature requests:

Feature Factory Life Cycle

Forked State

After a couple of years of living through the above cycle, we got to a point where the amount of new React functionality matched and exceeded the amount of legacy functionality. This gave rise to the biggest generator of bugs for years — two sources of truth for state.

When we started building out our first React components, we started using Redux to manage our global application state. We had also built something similar in-house to manage global state in our legacy views, this behemoth known as “EntityManager”. As we started to silo off pieces of the knockout code to translate into React, we would see regressions due to the lack of interplay between these two god-like managers of state. Forcing these two state managers to work with each other only led to more confusion, as it was unclear which one was the real source of truth. Our React components started to be littered with callbacks which, after dispatching a redux action, would need to update EntityManager and vice versa. This variable cost added up over time, prolonging both new feature development and rewriting old components.

Forked State

The forked state also made it much harder to onboard new engineers on to working with the repo. When starting out with a new project, it’s often customary to use existing code as a reference for adding new functionality. Interns and junior engineers would both have trouble figuring out “which way” new components should be written. This cost was felt not only by the new engineers, but also the existing ones through pull requests and responding to issues found by our QA team.

Lack of Expectations

As tough as the first two hurdles were to getting us across the finish line, the most challenging hurdle we discovered was our failure to define expected behavior. Tests that outlined all the edge cases that our components needed to fulfill? Nope. A design system that outlined expected user experience? Only with our new components. Developer documentation that outlined how our knockout views worked? Ha. Maybe a multi-line comment here and there.

There were several cascading issues that stemmed from this one problem. Developers would have to build mental models that reflected these expectations as we sifted through hundreds of lines of old logic directly. Inevitably, given how many niche edge cases were built to fulfill client demands and given engineers being human, logic would get lost in translation. This would be made more difficult given that the Knockout to React migration was a fundamental paradigm change from imperative to declarative. This meant that logic couldn’t simply be copied and pasted; we needed to rebuild components from the ground up with this new paradigm in mind. It would’ve taken a miracle to ensure all the old logic would make it through the migration intact.

Difficulty estimating component rewrites was another cascading effect that stemmed from the lack of expectations problem. Because existing behavior wasn’t clearly defined, the scope of the rewrite could only be approximated by skimming through and understanding the existing code. This made it harder for us to argue for sprint time dedicated to rewrites or on the product roadmap over other client commitments, perpetuating the feature factory cycle mentioned earlier.

Successful Strategies

For all the hurdles we faced along the way, we did eventually succeed in migrating our entire front end app. Here are some strategies that helped us get across the finish line.

Feature Compatibility

One core set of features in our app is the ability for different departments to configure their own forms: field labels, rules, etc. We built a library that facilitates this functionality, but it was only compatible with our React components and not the Knockout ones. When we wrote new abstractions, we preferred to not make the abstraction compatible with legacy components unless it was absolutely necessary, leading to a harder to maintain legacy codebase. We leveraged a library’s compatibility as an argument for migrating a given feature to React. By coupling feature improvements with our refactor, we were able to make time for the changes necessary.

At times, it was tempting to do both at once. We found that when we took the time to do the feature parity rewrite before the feature improvements instead of during, less bugs leaked into production. When we tried to incorporate both at once, we’d be tempted to cut corners from the legacy components, which inevitably leads to cutting an edge case that the legacy component supported that we needed to continue supporting.

Dedicating Sprint Time

In the first three years, rewrites would only happen in an ad-hoc manner or when developers had time on the side. Towards the last two years, all the remaining legacy code was eventually contained to one last feature (though it was our largest feature, police reports) and we could see the light at the end of the tunnel. We began to treat the remaining work left as its own project, the same as any product work. We were able to argue for the project to be put on the roadmap by showing the business value it would bring (mainly less bugs and increased developer speed) and splitting out what remained into actionable tasks. By doing the upfront technical planning of what was still left, we were able to finally complete the migration within a few months after that planning. Dedicated resources also allowed us to also be more diligent with completeness of the refactor, meaning we got better at deleting legacy functionality.

End to End Testing

As I mentioned, the lack of testing contributed to the lack of expectations problem that prolonged the refactor far more than it needed to be. This also placed a huge burden on our QA team, who had the unfortunate job of manually checking all existing functionality for any potential regression. We were able to alleviate some of this pain by adding end to end testing with Cypress. Because Cypress tests at the user level and with user interactions (clicking, typing, etc.), it was easy to write tests that were agnostic of implementation details. By adding a bunch of end to end tests that covered our reports page, QA was freed up for more exploratory testing, and we were able to ship our refactor to production sooner and with higher confidence.

Was It Worth It?

If you ask any engineer at Mark43 this question, you’ll get a resounding YES. Despite all the hurdles we faced, moving completely over to React alleviated the cognitive overhead of having to maintain multiple frameworks. Our frontend is now “modern” in the sense that there is far more community support for React today and it is much easier to attract/onboard new engineers to the company. There are less bugs slipping into production as a result of having to sync two sources of truth and using an unfamiliar technology in Knockout. We were so excited about it, that we projected a running scoreboard onto an office TV that kept us motivated until the very end:

Legacy Code Scoreboard — Top is delta in the last day, bottom is current amount

With the codebase fully in React, other frontend optimizations that the team has been looking to make can now more easily be implemented. Upgrading libraries. Performance improvements. Implementing typescript. More unit testing. We also saw the following quantifiable improvements:

  • Removed 33 third-party dependencies
  • CSS bundle size shrank 49%
  • JS bundle size shrank 37%

Large scale refactors are challenging in both a technical and procedural sense. To be done well, it needs to be treated with the same level of consideration as any other project: business value should be made clear, expected behavior should be documented, and dedicated sprint time should be made towards completing it. We definitely suffered in these areas early on, and it wasn’t until we improved in these areas that we were able to finally complete the refactor. When planning for your next large scale refactor, I’d encourage you to consider these strategies. The time you invest up front will save you several times over in the long run.

Thanks to the several Mark43 engineers I spoke to in the creation and editing of this article

--

--

David Vargas
Mark43 Engineering
0 Followers
Writer for

Learning to write and writing to learn