Photo by Lorenzo Cafaro from Pexels

What the Flow team has been up to

Hello! (And sorry for the radio silence.)

A lot of our open-source users are rightfully observing that the Flow team has effectively stopped paying attention to an ever-growing list of issues and PRs on GitHub. And while there has been a lot of activity in terms of GitHub commits during this time, there has been little to no communication about our roadmap.

We understand that this state of affairs is deeply concerning to our open-source users, to say the least. We are sorry. We are thankful to those who have been constructive with their criticism over the past several months. This post is a baby step towards fixing this problem.

The simple and honest explanation is that we were too heads-down on addressing Facebook-internal challenges, and felt too resource-constrained to do much else. It didn’t have to be this way, and we could have done better.

2018

In 2018 we created an internal roadmap to address the biggest challenges for our internal users. We were seeing unprecedented growth in the number of JavaScript files covered by Flow, and we were not keeping up in terms of both performance and reliability. Developers were waiting for multiple seconds, if not minutes, for the codebase to be rechecked after making edits in their code. And they were complaining that we were still not preventing enough new bugs landing in the codebase and breaking their code. They wanted:

  • Responsive IDE commands to drive frictionless code navigation, understanding, and development, irrespective of growth of codebase. And fast rechecks to make changes quickly, iteratively, and confidently, irrespective of growth of the codebase.
  • Reliable types that accurately describe run-time invariants and help reduce crashes in production, justifying investments in improving code quality through Flow coverage. And reliable code intelligence results exposed to IDEs and other tools.

Thus we identified the following focus areas spanning a couple of years:

  1. Performance: Make Flow scale as O(size of edits) rather than O(size of codebase), which necessarily involves projects aiming for big-O improvements.
  2. Reliability: Make Flow type analysis trustworthy with respect to run-time semantics, not only for product safety and developer efficiency but also to explore opportunities for runtime optimizations.
  3. Language tooling: Make Flow the basis for unified JavaScript tooling to enable language evolution for performance and reliability.

Our goals were very aggressive: we needed to cover a lot of ground in terms of scalability and trustworthiness, and we wanted to front-load the hardest parts first, which meant focusing on big-O performance improvements over the past six months and moving on to fundamental reliability improvements over the next six months.

Through this time, we shifted around some work on our long-term goals to respond to shorter-term needs, in particular on reliability. By the end of 2018 we are overall further along than expected on our long-term goals, having hit or close to hitting most of our planned milestones on performance as well as delivering significant additional results on reliability that were planned for 2019, while making solid progress on language tooling to unblock this work and improve our chances of success as we proceed.

Progress

Let’s dive deeper into the projects we’ve been working on.

Types-first

Types-first is a re-architecture of Flow that has experimentally shown huge improvements to time and memory usage on our current repositories while promising big-O scaling improvements into the future. Briefly, it exploits full type annotations at file boundaries to perform better (more parallelizable and less redundant) separate compilation, and is critical to how we’re thinking about all aspects of performance, including taming large dependency cycles, avoiding OOM exceptions, boosting IDE responsiveness, and dealing with explosive growth.

Making this change involves two sub-projects:

  • Codemod. We need to automatically fill in any missing type annotations for all files before we can then turn on checks to enforce that they exist — an invariant that types-first crucially relies on. This involved getting a verifier ready to feed the codemod, then building out a type-aware codemod tool as well as the code intelligence APIs driving the codemod, and ensuring as much compatibility with the current architecture as possible to minimize type coverage regression. We are nearly ready to land the codemod on our largest repository and plan to release the codemod internally and externally to enable automatic upgrades in other repositories.
  • Rewriting how files are checked against each other. We tested locally on a codemod’ed version of our largest repository, and observed a 20% reduction in initialization time and 2–10x reductions in recheck time, with comparable memory improvements in lazy mode.

Recheck optimization

When a file changes, we conservatively recheck all files that transitively depend on it. Ideally we would recheck only what needs to be rechecked based precisely on what changed.

Currently edits that merely change locations or otherwise don’t affect the meanings of types exported by a file are still considered significant changes that trigger huge rechecks, exacerbating Flow’s scaling problems. Building over types-first and a new system of location-agnostic types, we can cut rechecking work as soon as we detect that the meanings of types exported by a file have not changed. As a result, rechecks are expected to get orders of magnitude faster.

Source control integration

More generally, to deal with explosive growth of our repositories we have been working on source control integration, with an eye towards a future where even parsing files to create a dependency graph to bootstrap type checking must be performed lazily.

Technically, we built a feature that creates a “saved state” per commit that includes the relevant dependency graph, and initializes the Flow server with such a saved state based on which commit the developer’s local edits are based on. We also built a new lazy mode that is smarter at guessing what files the user cares about: thus it can restart the server instead of rechecking when it estimates that restarting is faster. When we launched this mode, as expected we saw fewer huge rechecks, shorter-lived servers, and fewer servers using a ton of memory.

Responsiveness

Huge rechecks also affected IDE responsiveness, not only for error reporting but also for responding to code intelligence requests. Flow could not cancel a recheck or respond to code intelligence requests during a recheck because of race conditions: a recheck involves parallel workers reading and writing shared memory, and responding to an IDE request also reads (and sometimes writes!) to the same shared memory.

We have re-architected major parts of Flow to switch to a system of “transactional” shared memory readers and writers to address this problem. The net effect is a huge boost in responsiveness, as well as savings in terms of system resources by canceling / re-scheduling redundant work.

Other optimizations have reduced the number of IDE requests internally that take over 1 second to service from >1M/day to ~25K/day.

GC / memory management

We also did various experiments to improve the shared memory subsystem used by our parallel workers.

  • Compacting collector. Flow shares some infrastructure with Hack, part of which was an inefficient copying shared memory garbage collector that we replaced with a newly designed compacting one. Before this change, garbage collection required 10GB of additional memory on our largest repository; after, no additional memory is required. Garbage collection is now 3–5x faster. This change also reduced OOMs which were caused by the memory growth required for copying GC.
  • New shared memory subsystem (“MLHeap”). Flow’s architecture for checking large codebases involves “copying” exported types of transitive dependencies for fast lookup, and serializing / deserializing these representations to / from a shared memory subsystem used by parallel workers. We built a functioning prototype that switches the shared memory subsystem with a native OCaml heap, thereby reducing the cost of copying, coupled with a sophisticated efficient compacting garbage collector. These changes resulted in a significant (~25%) wall time performance win and a small memory regression. This experiment opens new avenues for future development and performance work, and is particularly promising in conjunction with the types-first re-architecture. Relatedly, we also explored changes to the OCaml runtime to improve efficiency and eliminate barriers to adopting MLHeap (1, 2).

Several other experiments that have landed or going to land save another 5GB+ memory for our largest repository.

Better React support

A bunch of new React features were released throughout 2018. We rewrote our hard-coded React model with a new foundation, AbstractComponent, that let Flow support these new features: React.forwardRef, React.lazy, React.memo, React.Suspense, React.Fragment, React.StrictMode, React.ConcurrentMode. This new foundation also fixed many bugs where type precision was lost in higher-order components: in our largest repository, it found thousands of React-related errors. Crucially, it was also designed to make it easier to keep Flow’s types in sync with React’s features.

Better object model

A lot of our code uses React patterns, and in particular, makes heavy use of object spreads. Unfortunately, the types Flow infers for them are buggy, mostly because we do not enforce a separation between object types as originally designed (roughly, “interfaces”) and those that specify own-properties — exact (roughly, “records”) and inexact (roughly, “extensible records”). Fixing these bugs is an ongoing project involving rewriting our object model. Based on the prevalence and importance of exact object types in these React patterns, we are also planning to switch the default syntax of object types to mean exact (with new syntax for inexact object types, and existing syntax for interfaces).

Towards safe types

In 2018 we got strong internal feedback that using the any type was causing significant loss of coverage and, therefore, frequently breaking a bunch of code, far away from those uses and in unexpected ways. We launched Flow “strict” — a syntactic lint against using any. A company-wide goal through the year was to convert Flow files to this strict mode as much as possible, and so far we have hit around 50% conversion.

Going beyond Flow strict, we have added “unsoundness tracking” — a way to detect when any and other by-design unsoundness in Flow might affect the run-time safety of a type — and have been exploring an enhancement of Flow’s type system, built over this principle, that reasons about trustworthiness.

A final note

Recently a bunch of open-source projects originally created at Facebook published plans to be rewritten in TypeScript. At Facebook we strongly value the independence of individual teams in creating their roadmaps, and in doing the best they can for the products they build. The projects that have decided to switch to TypeScript have external contributors whose lives will be much easier with this switch, and we respect these decisions.

Overall, these projects are not representative of the enormous and highly interconnected Facebook product code covered by Flow. At the same time, the Flow team is also taking a closer look at our roadmap, and we will have more to share about this topic soon.