A leap in the evolution of Airtable’s codebase: Scaling TypeScript to thousands of projects

Michael Mitchell
The Airtable Engineering Blog
8 min readDec 4, 2024

By: Michael Mitchell, Patrick Hayes

We previously wrote about how Airtable migrated its codebase from Flow to TypeScript. A few years and several TypeScript versions later, TypeScript is still a critical tool for preventing bugs early on in the development cycle. Back when the previous article was written, we had just over 50 TypeScript projects — mostly mirroring the top-level directory structure of the repository.

Today we have nearly 3000 TypeScript projects. That’s why we’re sharing our journey to so many projects, and how it reduced our typechecking time by 65%.

The Problem

As our codebase grew, developers had to wait longer and longer to typecheck locally. This meant they often faced the painful choice of waiting for typechecking to complete (often multiple times, as they fixed errors), or relying on CI to typecheck for them. Neither option was an efficient use of developer time. So we took a closer look at why typechecking our codebase took so long, and how we could improve it.

Outgrowing Build Mode

Typically, our developers checked their changes with TypeScript Build Mode, which intelligently figures out which projects have been changed and identifies those that need to be rebuilt. This worked well enough for us when our codebase was smaller. But, overtime, we’ve found that it has some limitations:

  • It can use a lot of memory: A benefit of typechecking multiple projects in the same process is that types used in one project don’t have to be recreated if they’re used in another project. A downside to this is that, as each project is checked, the types that are retained in memory will only continue to grow which can eventually reach the system’s memory limit.
  • It can only build one project at a time: Because the TypeScript compiler is single-threaded, it can’t work on multiple projects simultaneously. We added support for parallelism, but projects could not be built at the same time as their dependencies. This meant that when many projects were all waiting for a common dependency, parallel CPU cores go unused.
  • Projects can cache poorly: Ideally, code changes are isolated to a handful of projects, so that only those projects and their transitive dependencies needed to be rebuilt. But, if code changes happen frequently, caching can become an issue in Build Mode.

All of these problems were exacerbated by having a small number of large projects, so we looked to see if breaking these projects up would help.

Easier said than done!

Detangling projects

Projects can’t be sliced up arbitrarily. Dependency cycles are disallowed by the TypeScript compiler — and cycles in our codebase were quite common. Ideally, projects should represent a logical component or module, so that future changes are more likely to reside within a single project. We adopted some new tools and techniques to make this easier:

  • We had recently started adopting Bazel, so we started generating our TSConfig files directly from our Bazel targets. This meant that, as we broke up our projects, Bazel was already enforcing a package structure without cycles that we could use to generate TSConfigs.
  • Gazelle is a project that can generate Bazel build files. We wrote a custom extension that analyzed what modules each TypeScript file imports in order to automatically extract dependencies and declare them in the generated build files. With both Bazel build files and TSConfigs automatically generated, developers don’t have to declare dependencies manually and extraneous dependencies will be removed automatically.
  • We built tools on top of Bazel to help us better understand project dependencies. We made our Gazelle extension log output lines as it traversed TypeScript files to say when an import led to a project to become a dependency. We wrote another tool that ingested these logs and could easily answer questions like “what files are imported in this project that cause it to depend on that project?”. These help developers when deciding how their projects should logically be split up and if they accidentally introduce a dependency cycle, they can easily identify the code change that caused the cycle to be introduced.
  • We also wanted to understand which projects were most valuable to fix. So we would measure the time each project took to build in CI, then we tracked the longest dependency chain in the project graph weighted by the build times. This dependency chain formed the “critical path”, and we focused our efforts on fixing projects that were the greatest contributors to this path.
  • We identified code patterns that made cycles more likely, and globally fixed them in our codebase. For example, we found that many dependency cycles could be broken by moving enums and interface types into a `types` subpackage, so we automated that migration.
  • We automated interface creation with Copilot. When we needed to generate new interface types for modules that didn’t have them, we found that Copilot was pretty good at suggesting autocompletions to fill out the body of an interface. So, we scripted neovim to automate the creation of these interface files and populate their bodies with Copilot. This worked pretty well but wasn’t perfect, so this step still needed manual review, but it dramatically accelerated this migration.
  • We had some very large “mega-packages” (like all of our front-end code) that had a lot of cycles, where breaking them up gradually wasn’t viable. So, we conducted build graph analysis to identify which modules had the fewest cyclic dependencies within the project. These modules would only have one or two imports that held them back from being pulled out. By prioritizing fixing those dependencies, we could break down the package much faster.

With these changes, our team was able to create thousands of much smaller projects. The critical path was shorter, and our cache performance was stronger. Still, we had hopes that we could continue to improve our typechecking times.

Isolated declarations

Even with many smaller projects, we found that core libraries with many dependencies were extending the critical path because they could not be parallelized with their dependencies. Initially, this seemed like a fundamental blocker to further improvements.

Here is a subsection of a performance profile of a build to get a visual sense of the impact:

Each row represents the actions being executed by a CPU. Note the wide gaps between some of these actions. These represent time that the CPU spends idly waiting for something to do. Such a waste!

To our delight, TypeScript 5.5 released with a new feature called Isolated Declarations. When a project opts into isolated declarations, the compiler can emit declarations separately from typechecking. This enables better parallelism, because projects only need the declarations of their dependencies to be typechecked. So, we updated our build pipeline to separate these tasks for projects that opted into isolated declarations. We realized if we could adopt isolated declarations, we would no longer be bottlenecked on typechecking the critical path, and could achieve be better CPU utilization and throughput, but to reap these benefits, isolated declarations requires most exports to have explicit type annotations, including those that could otherwise be inferred by the typechecker. So the first step we had to take was to update our codebase to have more explicit types.

Better explicit types

Of course, we didn’t have to add all these types manually. The TypeScript 5.5 release also came with a Visual Studio “quick fix” that could automatically add these types where needed. `ts-fix` is a CLI that applies these quick fixes, so we ran it on our projects to see if it generated the right types.

While it worked for common cases, we found that the “quick fix” types were often not viable to include in the codebase because they could be quite complex. We even found cases where a file that was only a few hundred lines could grow by tens of thousands of lines of code once types were automatically added! So how did we make the types more humane?

  • We were able to reuse our previously described Copilot scripts to generate more semantically relevant types. As long as the AI-generated types still passed CI, they were generally correct and were committed after being reviewed by a human.
  • Sometimes, the generated type was complicated because they tried to capture more specificity than we needed. Many of our React components inferred a complex return type, but really we only cared that they were returning a `React.ReactNode`. So we wrote ESLint custom rules that automatically filled in more concise types that could be overwritten with more complex types in the rare cases more specificity was needed.
  • In some cases, the inferred type is just intrinsically complicated, and there wasn’t a more concise correct representation. We decided that some types are better left implicit, and so for those projects we just left isolated declarations off.

While this is an ongoing journey, we now have isolated declarations enabled for about 85% of projects which has significantly improved typechecking times.

Was it worth it?

Let’s dive into some of the concrete outcomes that we had:

  • Today, if we run our build script on a 64-core machine with isolated declarations disabled, the build takes 9m25s. Without parallelism this build would have taken 5h28m48s or 35x longer!
  • With isolated declarations enabled, it takes 7m7s — another 25% improvement, 46x faster than without parallelism.
  • But on pull requests when we take advantage of the improved cache rate, the median time is ~3m, an improvement of 67%.

Remember that profile from earlier? Well here’s what a profile looks like now with isolated declarations:

We’ve managed to fill in most of the gaps! Some of the projects we haven’t enabled isolated declarations for yet explain the remaining gaps.

These performance results improve every developer’s local environment, and have decreased the time it takes for pull requests to be merged. While we’re excited about these improvements, we know there’s still room to improve our typechecking times further with more usage of isolated declarations, and even more fine-grained packages. Our team continues to investigate ways to make typechecking faster — if these problems sound interesting to you, we are hiring within the Developer Platform team — consider working on them with us.

Many thanks to:

  • Bloomberg engineering for driving and implementing isolated declarations in the TypeScript compiler
  • Google engineering for implementing a code fixer for isolated declarations in the compiler
  • The TypeScript project team for implementing noCheck mode and everything else :)
  • The Oxc project team for implementing a super fast alternative library for emitting declarations

--

--

The Airtable Engineering Blog
The Airtable Engineering Blog

Published in The Airtable Engineering Blog

The Airtable Engineering blog shares stories, learnings, best practices, and more from our journey to build a modular software toolkit.

Responses (2)