Types-First: A Scalable New Architecture for Flow
TL;DR: The types-first architecture unlocks Flow’s potential at scale by leveraging fully typed module boundaries. We plan to migrate to the new architecture over the next months.
Flow is designed to type check extremely large projects. Over the past several years, we’ve introduced some big changes to enable Flow to scale exponentially to 10M+ lines of code, just barely keeping pace with Facebook’s codebases. For example, lazy mode only checks files affected by local changes, instead of the entire codebase. However, Flow still has to run type analysis on the dependencies of the files that changed, a fundamental inefficiency which multiplies the cost of each change.
We are introducing a new architecture known as “types-first”, which allows us to avoid checking dependencies. It is critical to making Flow faster and allowing us to continue to scale. The benefits extend to other problematic areas, such as error messages and debuggability.
Performance
Types-first exploits full type annotations at module boundaries to make IDE services and rechecks multiple times faster on launch.
Overview
Consider a codebase with the following dependency graph. Here, dependencies are formed whenever a file imports a value or a type from another file.
Each dependency is represented with a directed edge. For example, file f
depends on files u2
and u3
, and files u1
and u2
are in a dependency cycle. Files d1
and d2
are “downstream” dependent on f
. These files in turn may depend on a larger set of “upstream” files. In our example d2
also depends on u4
.
Let’s say that we’re in lazy mode and file f
changes. We need to recheck not only f
, but also its downstream files d1
and d2
. To unblock these rechecks, we need to compute the types of the upstream files too. Let's compare how this worked in the “classic” architecture (the current default mode), and how the nature of this work changes in types-first.
In classic mode, computing the types of the upstream files meant that we needed to check their code! Moreover, if these files were involved in dependency cycles, we needed to check all of those files together before we could ever get to f
or the downstream files that depend on it. We say that files in dependency cycles, for example u1
and u2
, form “components”.
Each component needs to be checked in its own process, and so large components hurt parallelism. This architecture led to a variety of performance problems, including peak memory blowups, long garbage collection pauses, amplification of non-linear time complexity, and low use of parallelism.
On the other hand, in types-first, we compute the types of the upstream files directly, since module boundaries must have full type annotations. This is much less work than checking their code.
Moreover, in calculating dependencies we only care about imports that affect the type signature of a file. For example if the contents of u1
are
const {U2} = require('./u2');
export class U1 extends U2 {
init() {}
}
and those of u2
const {U1} = require('./u1');
export class U2 {
constructor(): void {
new U1().init();
}
}
the signature of u2
does not depend on u1
anymore. Dependencies between types are typically smaller than the dependencies between values. This gives rise to a new kind of dependency graph, one that is based on types rather than values, and means that we might be able to skip checking some downstream files. It also means that this work is less affected by large dependency cycles.
In our example, u2
does not depend on u1
, and let’s also assume that d1
does not depend on f
anymore. We have eliminated the dependency cycle entirely, and have left just f
and d2
as the only files that need to be checked.
Once we have computed the types of upstream files, the code of f
and the downstream files that depend on it can be checked completely in parallel. This is much faster than checking their code with the limited parallelism offered by the code dependency graph of classic mode.
Recheck optimizations with types-first
Types-first paves the way for a number of other optimizations in Flow’s rechecking logic, that move Flow’s work after initialization closer to being proportional to the size of the code that changed.
Typically, even though the set of downstream dependents of a file under edit can be large, only a small fraction of those files really need to be rechecked. In particular, whenever the types of the dependencies of a file have not changed, that file does not need to be rechecked.
With classic mode, we could almost never detect and skip redundant rechecking because of two main reasons.
- Types were computed as part of checking code, and the mechanism to detect when types changed was brittle: its effectiveness relied on being able to ignore benign differences in the internal representation of types that happen every time code is checked.
- Worse, any changes to “source locations” of types were considered significant changes for error-reporting purposes. As such, even adding a blank line to a file could cause downstream files to be rechecked.
With types-first, we are in a position to solve both these problems. Type annotations are more robust to edits like the above and can dramatically limit the rechecking effect.
Experimental results
Facebook has tens of millions of lines of JavaScript code. Overall, we saw rechecking speedups of ~6x in the p90 and ~2x in the p99 when we rolled out types-first in our codebase. The kind of recheck that saw the biggest improvement with types-first is unchecked dependencies recheck. This is the check that happens when a user opens a file whose dependencies have not been checked. There, the we saw the following improvement:
Avg: 9.10s -> 1.37s (-84.957%)
p50: 1.95s -> 0.90s (-53.763%)
p75: 7.85s -> 1.95s (-75.143%)
p90: 22.5s -> 2.83s (-87.456%)
p95: 42.8s -> 3.42s (-92.006%)
p99: 107s -> 5.63s (-94.730%)
Reliability
Much like what happens at run time, Flow’s type inference can miss mistakes whose effects are not revealed locally; in such cases, the inferred types become complicated, pushing the burden of dealing with those mistakes downstream or causing the inevitable errors to be reported later. Type annotations reveal such mistakes earlier. In classic mode, Flow only asked for partial type annotations at module boundaries, and inferred the rest. In types-first, it asks for full annotations at module boundaries. When filling in the missing annotations in our internal codebase, the Flow team found thousands of examples of complicated inferred types that caused confusing errors; simplifying the types manually led naturally to the necessary fixes.
Moreover, by isolating the effects of checking files from one another, types-first enables the enforcement of basic invariants on error locations: e.g., an error found when checking a file must point to some location in that file, and any other locations referenced in the error must be either in that file or the files it depends on. Violations of these basic invariants not only lead to errors that are difficult to understand, but also cause other problems, e.g., error suppressions that hide more errors than intended, and inefficient or unreliable streaming of errors in the IDE. On the other hand, enforcing and exploiting these invariants has helped improve the design and implementation of suppressions and error streaming.
Finally, types-first makes debugging issues with performance and errors easier. While in classic mode, isolating a problem often meant tracing Flow through code across multiple files that were often part of large dependency cycles, in types-first we need to look at the code only in the relevant file, and just the types in other files that it depends on.
How to upgrade your codebase to types-first
Here are some quick instructions on how to upgrade your codebase to types-first in Flow version ≥0.125
:
To see what types are missing to make your codebase types-first ready, add the following line to the [options]
section of the .flowconfig
file
well_formed_exports=true
The next step is to address the above errors by adding the missing annotations. This can be done manually, or using a codemod that comes along with the Flow binary. You can invoke this by calling
flow codemod annotate-exports --write --repeat /path/to/folder
This will update your files in place. Note that it might skip some necessary annotations, or introduce some new errors. These will have to be fixed manually.
Once you have eliminated signature verification errors, you can turn on the types-first mode, by adding the following line to the [options]
section of the .flowconfig
file
types_first=true
Finally, you might need to address some newly introduced Flow errors.
See our docs on types-first mode and the provided codemod for more detailed instructions and trouble-shooting tips.
Gradual Migration Path
Types-first radically improves Flow’s performance and scalability, and the invariants it sets in place enable us to bring more wins in the future. We understand that for some larger codebases the migration process can be time consuming, so we will continue to support classic mode for the next 6 months. Specifically, our plan is to:
- Keep classic mode the default mode until Flow v0.133 (about three months from now). Types-first mode will be available through a flag in
.flowconfig
. - Make types-first the default mode from Flow version v0.134 on. Classic mode will be available, but will have to be explicitly specified in the
.flowconfig
, and there will be a deprecation warning. - Remove support for classic mode in Flow by January of 2021. Codebases will need to have their module exports annotated to avoid errors.