Adventures with NPM or: How I Learned to Stop Shrinkwrapping and Love Yarn
In the Fall of 2016, the front end team here at Work Market began to notice some problems, both real and potential, in terms of how we were managing our dependencies. We use npm as a package manager (bower was fully removed from the legacy codebase around this time), and if you use it extensively, then some of these issues might sound familiar to you.
When using npm, you essentially enter into an “honor system” with package authors. As the consumer of these open source packages, you have little to no control over how the authors decide to make changes, or whether they strictly adhere to semver. Depending on the situation, it might be close to impossible for a package author to know whether any changes they make will be a breaking change or a simple patch. We ran into situations where a package authors accidentally removed dependencies or simply made a change to how their libraries were built that caused them to break in our environments. Since our dependencies are installed every time a build is made, unanticipated changes (read: bugs) can be introduced unwittingly during every deploy.
Other than introducing unanticipated bugs, this method of dependency management will undoubtedly lead to different build structures being generated on different machines at different times — all with the same codebase. For example, consider this workflow:
- Engineer develops a new feature
- Engineer commits, opens PR
- CI server builds and runs automated tests
- QE checks out PR and tests
- Engineer kicks off build to deploy
At any of these points, a dependency change can take place — whether due to a version bump by the package author or due to the simple fact that you are not freezing package versions throughout the dependency tree. This means builds can have different behaviors at any one of the points above, and that can be incredibly difficult to debug.
Switching to npm shrinkwrap
To prevent the above issues, we decided to utilize npm’s built-in method for freezing all dependency versions —
npm shrinkwrap . This generates a “lockfile” that specifies the exact version and URL from which to download every single dependency (including the subdependencies of those specified in
package.json). This solves the issue of generating different build structures unintentionally, as well as preventing package version changes from making their way in without upgrading them explicitly.
npm shrinkwrap is awful
We quickly realized after making the switch that there came serious tradeoffs with using npm’s shrinkwrapping method.
First, managing dependencies is a straight up nightmare. npm already doesn’t do a great job of building dependencies, shrinkwrap or no. npm builds dependency trees non-deterministically — that is, the tree’s structure can differ from one machine to another based simply on the order that the packages are downloaded. Regardless of the packages’ versions, the way that they are built can still have unanticipated side effects.
Furthermore, issues can arise when two or more of your dependencies specified in your
package.json share a common dependency — especially if the versions are different. When you attempt to add / remove a dependency, it will check your already-built dependency tree (meaning, you have to
npm install first) against the
npm-shrinkwrap.json file. Even if you do a fresh install and then attempt to add or remove a dependency, npm might believe that something is off — maybe you have what it considers duplicate or erroneous dependencies installed. Again, this can be directly after you
npm install. I personally have spent hours and hours attempting to simply add or upgrade a dependency in our application. Usually the engineer is forced to do some combination of commands like:
npm install. Sometimes this works, but many times this ends up screwing up the entire build and forces you to start over.
Second, version control is very difficult to manage. You typically would want to check
npm-shrinkwrap.json into git so that all other engineers, QEs, or CI servers would read from the same file. The file itself is virtually unreadable to humans, and the generated file can change in really strange ways. If you were to shrinkwrap, delete the file, then shrinkwrap again, the two generated files might be completely different. Upgrading a single dependency might lead to additions, deletions, and the moving of dependencies around in the file that makes it incredibly difficult for the code reviewer to tell what’s going on.
Oh, and did I mention that the behavior of npm shrinkwrap might change completely in certain ways across npm patch version updates? This issue in particular wreaked havoc on our CI servers. And take a look at the issue tracker for shrinkwrap — it doesn’t exactly inspire confidence.
Yarn to the rescue
First: dependency management. All of the trouble I described above? Gone.
Yarn generates a file called
yarn.lock that is it’s own version of
npm-shrinkwrap.json. It’s much easier to read, and version control changes actually make sense when reviewing PRs. Yarn uses this lockfile to generate deterministic builds — meaning you can forget about those dependency tree discrepancies that I previously described. Builds will be the same from machine to machine — from your local builds to the builds on your CI servers. Yarn also automatically resolves duplicate dependencies and generates a flat dependency tree.
Ok, great — that was enough right there to sell to my team on switching over. That’s not all, though.
Yarn has yielded significantly faster build times across all of our build environments. There are a couple of reasons for this, namely caching and parallelized operations. Yarn caches packages that are downloaded to reduce the need to download again in the future (there’s also an offline mode that allows you to work without network connection). Repeated installs on either our engineers’ local machines or on our CI servers’ slaves take advantage of this feature. Yarn also downloads and install packages in parallel rather than serially (which the method that npm employs), which leads to more efficient resource utilization.
Initial testing has revealed that using Yarn has cut down on front end dependency download and install time by ~45–55%. Here’s a breakdown:
using node 6.7.0 + npm 3.10.3
cold install = clearing cache, units in seconds
npm cold install (shrinkwrap): 92s
npm install (shrinkwrap): 92s
npm cold install (no shrinkwrap): 97s
npm install (no shrinkwrap): 77s
yarn cold install: 95s
yarn install: 51s
We build and run automated tests on the CI servers against our application a lot. Every single time a PR is created or updated, at least two test suites are run — meaning that the front end is installed and built at least twice. In some cases, builds occur 4 or 5 times per PR. Behind all of these CI builds are, of course, the engineers’ and QEs’ local builds, as well as the actual builds that take place for deployment purposes.
In other words, the time savings add up very quickly. A rough estimate for just acceptance test builds that run for our PRs is that we will be shaving off about 7 hours of build time per month. That number climbs to around 10–12 hours once integration tests are added in (these usually run in parallel on a different slave).
Overall, we’ve been very pleased with the results, but it’s early days. We’ll be gathering more information on our build times in a reporting system to determine the overall effect of the transition over time and will update with the results.