Building large scale react applications in a monorepo

A monorepo can be described as a repository that contains more than one logical project. These projects can be unrelated, loosely coupled or connected by dependency management tools, they are usually large in size, number of branches, commits and working developers.

React applications are all about components, when developing react apps you’ll be splitting it into components that aim to be reused within different areas of your application or multiple applications.

Sharing code across multiple repositories

Once you start feeling the need to share code across multiple applications you may choose to start creating different repositories and publishing them as npm packages. 
As the number of applications and shared components scales you start facing some issues:

Testing: To properly test a shared package you’ll need to locally link several different projects, this can become a daunting task as the number of interdependent packages grows and npm linking can sometimes yield unexpected behaviours.

Publishing: Propagating a fix might require updating and publishing several different interdependent projects in a correct order.

Discoverability: Available reusable components may become hard to find.

Refactoring: Refactoring may lead to updates across multiple packages which then leads to the testing and publishing issues mentioned above.

Friction: If it’s hard to create or update a reusable library, developers will avoid doing it and you’ll either face duplicated code or code put in places it doesn’t belong.

Monorepo != Monolith

When you have all your codebase within the same repository it’s tempting to fall into the trap of creating a monolith (a gigantic application where all parts are intertwined with each other) so we need to ensure that modularity is a first class citizen within these repositories.

Separating the different pieces of these projects within their own subfolders is not enough, within these projects components are still independent packages that can be developed tested and published independently.

Each package has clearly defined boundaries and ownership.

To facilitate these workflows we have tools like yarn workspaces and lerna to make working with and publishing multiple packages an easy task.

Workspaces

Usually when working with a monorepo you’ll have a project structure similar to:

| packages/
| -- project-1
| ---- index.js
| ---- package.json
| -- project-2
| ---- index.js
| ---- package.json
| package.json

You can see that each folder inside packages has it’s own package.json specifying it’s dependencies and npm scripts if needed.

The dependencies within each package can be external or within the same repository. When dependencies live in the same repository yarn symlinks the workspaces, ensuring that these dependencies will always use the latest source code available.

When using Yarn with workspaces enabled it will install dependencies from multiple package.json files in subfolders, under one root folder without duplication.

Lerna
When used with Yarn it facilitates running npm or arbitrary scripts across packages, such as test or build and also the publishing process of multiple interdependent packages. This allows you to share packages that live inside your monorepo with other projects as npm packages. 
Lerna also makes sure that versions between interdependent packages within your repository are always kept in sync and published in batch.

Structure

Companies like Facebook and Google are known for their massive monorepos where all the organization code lives under one repository. This approach can lead to version control scaling problems as these repositories can be terabites in size, these issues led these companies to create custom version control systems.
Most organizations dont have the budget for a task of that size so a different structure is necessary.

Product/Domain Monorepo
All the parts that compose a product or all applications belonging to a single domain live under the same monorepo, even if the product is multi platform (react, react-native).

Project structure:

// A cross platform react application
| components/
| -- comp-1
| ---- index.js
| ---- package.json
| -- comp-2
| ---- index.js
| ---- package.json
| apps/
| -- app-web
| ---- index.js
| ---- package.json
| -- app-ios
| ---- index.js
| ---- package.json
| api/
| -- api-1
| ---- index.js
| ---- package.json
| -- api-2
| ---- index.js
| ---- package.json
| package.json

You’ll still need to resort to independent repositories to share code between these repositories.

Code Ownership

As your codebase and number of developers grow, knowing who is the best person for a code review, or the person responsible for a specific module is hard, also code reviews are more effective when the reviewer is familiar with the code being reviewed.
Domain, application, and even component specific knowledge are important to maintin high quality software, and have a clear impact in the number of defects found.

In a large project ownership needs to be clearly defined and automated.

Github allows you to specify code owners for folders within your codebase, these owners are always added as reviewers for changes in that part of the application, making ownership easy and explicit.

Smart Tooling

In order to work within these repositories while remaining productive there’s a need for tooling that can remain efficient in an environment with large amounts of source code.

The management of dependencies between the different components that compose our product is already facilitated. But what about building and testing applications ?

In a large repository you can’t afford to rebuild and test all the source code each time you make a change. A change to a package should only test and build consumers of that package, the time to build and test the project has to be proportional to what was changed not to the size of the project.

A change to component-a should only test and build itself and its consumers, leaving the other parts of the codebase untouched

Package Graph
In order to make changes efficiently and with confidence you need to know which packages depend on each other to be sure of what needs to be tested and built whenever you make a change. In a small project you may know this by head, but when working with large codebases this needs to be automated. 
Lerna already creates internally a package graph to determine what packages need to be published. If you want to use this concept in your own setup to create custom tasks you can use some of lerna internals such as @lerna/package-graph.

/* An example of the package graph data structure. You can traverse the nodes to find all dependants of a given package */
PackageGraph {
nodes:
[{
package:{
name: '@namespace/my-package',
version: '0.0.1'
},
dependencies:
[
'@namespace/my-package-1',
'@namespace/my-package-2',
'@namespace/my-package-3'
],
}],
}

Change detection
To know if a package and its consumers need to be tested and built you need to be able to efficiently detect if there were any changes to its source code.
Lerna can be used with a flag to detect changes from a given point in your version control system.

$ lerna run test --since master

This would already allow you to optimise your workflow, but this would mark a package as changed even if you’ve just changed code formatting. Although this workflow can scale for a quite large application you can make it more efficient by detecting changes to a component’s public API. Tools like typescript allow you to create a d.ts file to declare a component or library public API. This allows you to optimise your workflow by just building and testing affected packages if a component public API changes. Lerna doesn’t provide this functionality so if you want to leverage this in your own setup you’ll need a custom implementation.

//Example of a typescript declaration file
declare class MyClass {
constructor(someParam?: string);
someProperty: string[];
myMethod(opts: MyClass.MyClassMethodOptions): number;
}

Parallel work
To be efficient when working with multiple packages parallelism should be used whenever possible. An ideal build system would take advantage of the multiple cores available on your machine.
We can already run scripts in parallel with lerna and stream the output to the terminal.

/*This would run the build script of each package in your monorepo in parallel and stream the output to the terminal*/
$ lerna run build --parallel --stream

Ideally any work done by your build system needs to be parallelized as much as possible.

create-react-app

create-react-app is already starting to support working with multiple react apps inside a monorepo: https://github.com/facebook/create-react-app/pull/3741

Productivity

This is all about productivity, enabling you to easily change any part of the application with confidence knowing exactly which parts were affected.
Only when you achieve this kind of frictionless workflow developers will start sharing code at scale, otherwise creating and maintaning reusable components and libraries will be seen as a chore that is too risky, cumbersome and avoided as much as possible.

Large Scale refactorings
In a monorepo you can refactor an API and all of its callers in one go, or leverage codemods to refactor code across the whole repository, then you can open a code review with all the code owners automatically added for you.

Everything is always integrated
In a monorepo the cost of integration is paid at every commit with multiple repos the cost of integration is paid at release/integration time with a probably larger diff.

Multiple repositories lead to lots of repetition
Multiple repositories can lead to lots of repetition in configuration and project setups:

  • Dev environments
  • Build configurations
  • Dependencies
  • Test configuration
  • Pull request templates
  • ESLint
  • Prettier
  • CI/CD

Caveats

When working with monorepos there are of course trade-offs you need to make:

Established source control best practices
Keep your commits small and if you need to make a cross package refactor make it as incremental as possible.

CI/CD 
You will need a more complex setup for CI/CD pipelines as some tools still don’t support having multiple projects within one repository. Ideally you’ll have the ability to monitor multiple subfolders within one repository to trigger the pipelines. Here it’s also ideal to parallelize as much as possible.

Master means live
A monorepo means you no longer have to cherry pick versions to deploy an app based on multiple packages, but it also means that the versions in master are the versions that get deployed.

Tooling
You’ll need to invest in a customized build system to make sure development stays fast as the monorepo grows.

Wrapping up

The javascript ecosystem already seems to have all the pieces available to make these concepts work properly, they just aren’t provided as single unified solution.
This seems to be a great model to allow the development of applications by multiple interdependent teams that can work on multiple packages and applications while keeping all the parts tightly integrated.
There are some great examples from the angular community like ABC and NX that also promote the monorepo way of building applications.

Resources

Talks