Thoughts about package modularization

4 min readMay 7, 2016

When thinking about modularization of npm modules, it seems often beneficial to extract different layers of a package into own packages. This allows other OSS authors to reuse certain parts of your code and can improve the architectural quality of the codebase by forcing you to strictly encapsulate and document different parts.

In order to accomplish this splitting, we often create a new repo where the new package’s code is extracted to. Many essential tools like semantic-release or greenkeeper only work when this 1:1 mapping of package to repo is used.

There are two drawbacks of this 1:1 mapping:

Integration tests of the different components are harder. While code that belongs to multiple repositories can be included in one test case, it’s impossible to iterate on test results as everything but the code of the repo you are testing from is effectively “static”. Even if you npm link stuff together, it won’t be possible to have one commit describing the fix that involved multiple packages.
Infrastructure duplication: boilerplate and test setup is duplicated. While this might be ok for 2–3 packages, it won’t be feasible to iterate on this infra for > 10 packages (also think all those greenkeeper PRs for your > 20 devDeps times X). This effectively discourages modularization, forcing you to ask yourself if the extraction of a small package might be “worth the trouble”.

The alternative to this repo extraction is using a monorepo, as prominently shown by babel. Packages are not extracted to another repo, but just another subfolder in the “packages” directory. The packages can still be released independently on npm; each folder in packages just has its own package.json and npm run publish is called in this given subdirectory.

There are some problems:

While integration tests are now easier, they still have a major issue: packages requiring other packages (after npm linking them) will use their main entry point specified in the package.json, which will point to the compiled dist/ directory in the case you use a babel build step. This is super awkward: these files are meant for consumers, not for your tests. Running tests in watch mode while changing code of multiple packages now implies that some babel-cli watch task would have to continuously compile those dist/ directories. Additionally, you’ll probably want to test the src/ files when testing a package directly. So you actually have two simultaneous sources of truth: src/ for tests that directly import the package, dist/ for indirect imports. Maybe I’m not seeing something, but this sounds like a mess. :)
The build process is more complex. I set up a monorepo architecture for webpack-validator, and we came to the conclusion that the added complexity was not justified by the benefits. This is the diff of the PR, see for yourself. The main challenges where: a) figuring out that dist/ files should not be included in coverage results (which opens yet another can of worms — what about packages that are only transitively tested? They would count as not covered). b) Grokking the recursive babel compiling, adapting from babels Gulpfile. c) Splitting up dependencies into multiple package.jsons (this actually felt good :)).
You lose automatic releases and changelogs via semantic-release. This is huge, because you loose the ability to cut releases by just pressing the green merge button. Instead you’ll use something like Lerna, the build tool that has been extracted from the babel repo setup. After reading its docs it is not clear to me how this fits together with semver. It basically increases the version number for all packages that have changed since the last publish by reading it from a top level VERSION file. You increment this version by yourself, probably by looking at the commits since the last published tag. I don’t really get how this can implement semver on a package level: if, since my latest publish, package A had a breaking change but package B had only a fix, this would mean a major version increment for both of them, which sounds weird to me. Edit: I have since learned that Lerna 2 will allow you to specify versions independently, which will mitigate the discussed semver concern — which of course comes at the cost of increased maintainer workload. It’s a largely unexplored but exciting space of tooling and patterns and best practices have yet to emerge and solidify. (Thought experiment: how could a semantic-release for monorepos look like?)

Conclusion

Both approaches have drawbacks. At the moment there is no solution for package modularization with great DX. My personal take on this is: semantic-release is too good to give up, so the 1:1 package:repo approach is the way to go at the moment, although integration tests will lack and there is the burden of infra overhead. It would be cool to be able to abstract this infra away, not via yeoman but by encapsulation: one npm install that encapsulates all those devDeps, test setup boilerplate, config files etc, like an eslint-config-* or babel-preset-* for npm packages. I don’t think this is really possible right now, given the semantics of npm (one thing that’s missing is the possibility to express the pattern “deps of this package should be top level deps”).

Thoughts about package modularization

Conclusion

Written by Jonathan Glasmeyer