Refactoring a Legacy Flow Codebase

Published in

Bluecore Engineering

9 min readJun 11, 2020

Flow and TypeScript are both type systems that help developers ship scalable JavaScript applications, make changes with confidence, and prevent bugs. At Bluecore, it took us a while to distill best practices with Flow, some of which include how to properly propagate or “flow” our types down through the codebase and auto-generating typed bindings from APIs. This blog post shares best practices and techniques that we’ve learned to optimize our usage of Flow.

This post assumes a basic understanding of Flow, this getting started guide should provide enough context for this blog.

A Brief History

In 2015, Bluecore’s user interface (UI) started out using basic web forms in Jinja, a Python-based templating language. At first, the UI was mostly used by our Forward Deployed Engineers, but as we added more features to the product and increased our client base, more and more of our customers started to self-serve. Eventually, we split out the UI from the backend into its own application. During this time, React was the new hot thing and CoffeeScript was a popular language that had many features JavaScript didn’t have. Bluecore’s first customer-facing platform was built using React, CoffeeScript, Reflux, and PropTypes (React’s default type checking library).

Bluecores UI tech stack over the years. — Bluecores UI tech stack over time.

Fast forward a year — Bluecore doubled its engineering team size, resulting in a larger and more complex codebase. Our developers needed a better way to maintain the codebase, as PropTypes was outclassed by popular compile-time type checking alternatives like Flow and TypeScript. At the time, the frontend team decided to adopt Flow because it integrated well with React, whereas TypeScript was less mature and had much less support than it does now. However, due to a lack of proper usage and standardization at Bluecore, our Flow codebase became even harder to maintain and more difficult for new engineers to approach.

Some common problems we encountered:

No API payload contracts, leading to a gap in frontend and backend expectations.
No typed libraries and HOCs, causing types to be dropped midway through data flows.
Excessive usage of weak types (any, Object, Function); which essentially opts-out of the type checker in many places in the code.

Zero Libdefs

When I joined Bluecore in late 2018, the frontend codebase had mostly migrated from PropTypes to Flow with a little bit of CoffeeScript remaining. One of the first things I noticed about our codebase was the lack of library definitions. Library Definitions (libdefs) are special files that tell Flow the type signature of third-party libraries.

Why does it matter?

Without any libdefs, Flow will assume that all library imports default to an any type. Using the any type causes Flow to opt-out of type checking. A library without its libdef will result in dropped types between where the type is defined and where it’s used. This means if a breaking change was made to the type and there was no libdef, Flow would not report it as an error.

This is especially important in a Redux application, where the data flow between the different areas of the application is piped through the use of many libraries. In our React + Redux app, we make use of common libraries such as react-redux, redux-actions, reselect, and redux-thunk. These libraries are all part of the Redux data flow chain and need to be typed in order to achieve full end-to-end type coverage. If any libdef is missed, we lose proper type checking within the data flow, weakening the initial purpose of using Flow.

A typical React + Redux data flow and the libraries we used, the red represents external libraries of which we need libdefs.

How did we solve this?

The fix seemed simple and easy, we just needed to add the missing libdefs. A npm package, flow-typed, allows us to run a command that will add existing public libdefs for libraries in our repository. It should only take the team a few days to debug errors that appear from running the command, and all of our problems will (or should) be solved, right? Unfortunately, it didn’t work this way. While this approach will work for smaller packages, it isn’t so straightforward for libraries with more complex typings.

We use 50+ libraries thousands of times across thousands of files. After running flow-typed on our codebase, it surfaced thousands of errors. Most of the errors were false positives but still required individual attention to address. It was easy to fix the libraries outputting a small number of errors, but what about the ones that caused hundreds, if not thousands, of errors? How can we fix all these errors in a reliable, measurable, and safe way?

We had several options to tackle this:

Code freeze: We debated having a code freeze for a few sprints and everyone would grind out fixing Flow errors during this time. However, this would mean that all bug fixes, feature work, and releases would have to be put on hold. This is difficult because of the fast-paced nature of startups since we need to constantly ship features and products to keep our customers happy. As engineers, technical maintenance to keep a codebase healthy is important, but as employees of a business, we can’t risk losing customer confidence. Therefore, a code freeze was not a viable option.
Maintaining a separate branch: The next idea was to have a separate dev branch where we can start incrementally fixing the Flow errors and rebasing with the master branch to keep it up-to-date. We quickly ruled this out as a solution though, as we didn’t like the idea of having to maintain a separate branch that would be so different from the master branch.
Incrementally fixing errors through untyped versions of the libraries: We defined an “untyped” version of a library as a renamed file that exports all the functions of the actual library and reference that in our code. Since the file name of the “untyped” file is no longer the same as the libdef, Flow will not associate the libdef with the new file. The functionality of the library still works and doesn’t break anything. This is the approach we decided to use.

We followed several simple steps:

Create an untyped version of a library.
export * from 'react-redux'; // untyped-react-redux.js
Replace all usages of the library with the untyped one and merge the code.
Work on swapping out the untyped versions with the typed versions, and fix any errors that appear as a result.
Move on to the next library and repeat.
Profit 💲💲💲.

While this approach is slow, it guarantees us the flexibility and reliability that we need. We can create tickets based on code ownership, bundle priority, or any other measure. We can also work on features and bugs on the side while ensuring that any new code will use typed libraries and not increase technical debt. As a bonus, we can track our progress over-time since all we needed to count was the number of files that had untyped usages in our codebase.
Using this approach, we were able to type and fix ~20k lines of code over six months. In the end, we removed the untyped library files we created and harnessed the zen of having our libraries be fully typed.

Progress of our untyped usages over the end of last year.

Staying in sync between the client and server

Ensuring that the data flow within the frontend application is typed is only part of the solution. Having consistent and well-defined types that the UI receives from the server was just as important. Establishing a clearly defined API contract between the client and the server is crucial in understanding the business requirements. We started getting bugs when a breaking change was introduced into an API, but no corresponding frontend changes were made to accommodate the change. We wanted a way for the UI to stay in sync with what API data our servers expected and returned.

API versioning is a common technique for handling these issues. Another technique is to have the server generate schemas for the frontend to use. This is incredibly useful because it ensures that all parties will know when a breaking change in the API happens. GraphQL + Apollo + TypeScript is a popular example of this paradigm.

Following this pattern, we decided to implement our own version of schema generation leveraging Marshmallow Schemas + APISpec + Flow.

We started requiring all new APIs to be created with Marshmallow Schemas. This had the added benefit of adding serialization and validation to our APIs.
Afterwards, we converted these schemas to Swagger YAML using APISpec.
In the same workflow, we started a node process that would convert the generated Swagger YAML to Flow Types using swagger-to-flowtype into a directory that the frontend would use.
After this workflow finished, we would run Flow on our frontend codebase to ensure that no errors would surface an API change. If an error occurred, we discussed whether this change was intentional or not.

Our process for auto-generating Flow typings from APIs.

We require these workflows to run on all PRs which gives us the needed confidence for keeping our UI in sync with the server. Using this workflow, we can guarantee that if any breaking API changes would be caught before code got shipped to production.

Object types are “inexact” by default

Let’s say we declare an object type such as this one:

It is possible to create instances of this type such as:

This is valid and Flow will not raise any errors. However, It will if we make Employee exact.

Flow types should be annotated as exactly as possible. If something is inexact, it must have been done for a special reason. Objects being inexact by default is probably one of the biggest flaws of Flow. To combat this, we decided that require-exact-type is one of the few ESLint rules for Flow that should be enabled by default. Furthermore, we are slowly converting inexact types to be exact to ensure type reliability and safety in the future.

Use linting to your advantage

In addition to basic syntax checking, linting can also be used to help refactor a codebase and keep it up-to-date. We enabled several rules with Flow using ESLint strategically to help us maintain our codebase.

Some rules that we thought were helpful to enable:

eslint-plugin-flowtype/no-weak-types: Enforcing this rule will forbid any uses of any, Object, and Function. It is much better to always be explicit with typing than not. The occasional time where it is not feasible to type something, we can opt-out of the rule with eslint-disable-next-line flowtype/no-weak-types.
eslint-plugin-flowtype/require-return-type: We had a discussion internally whether we wanted to enable this rule globally. Although it can be useful for new engineers to see the return types, it can be verbose and repetitive since Flow is adept at inferring return types already. We decided to enable this rule for a subset of files only. We have a naming convention that files that interact with an API are inside an api directory. Using this convention, we can have the ESLint rule enabled only for these directories.

Using ESLint’s override functionality, we required certain rules to be enabled for new files and certain rules to be disabled for older files. It entirely depends on the rule and circumstances that are set in place. This technique was a major help when refactoring old code and ensuring that new code wouldn’t cause more issues.

Want to enable a rule but can’t fix old files immediately? Add a regex to the overrides array that catches new directories or files.

Want to enable a rule for only files that match a certain naming convention? Add a regex to the overrides array that catches the naming convention.

Override functionality for ESLint

Conclusion

Being able to fully type a JavaScript application is a difficult but fulfilling experience. This post outlined just a few of the many ways to break out of a legacy codebase. Using simple workarounds, linting, and enforcing processes through CI/CD reduces a codebase’s technical debt; making it a better experience for engineers and users.