Why Version Control isn’t Package Management: A brief history

Or, why aren’t we using npm for the frontend?


tl;dr: version control is for source, not builds. Use npm.

Note: I have no financial interest or any of the sort in npm. I am merely a fanboy with great experiences with it. I wrote this primarily for internal inspiration at Atlassian, but I’ve republished it here for the world.

<script>

Like many of you, it wasn’t too long ago that I visited the jQuery website, clicked that big download button, and plopped jquery-1.5.js into the vendor directory of my frontend apps. I checked that file into git, added a script tag and was good to go. When it came time to upgrade, the answer was simple: just repeat this lovingly manual process. Of course, things would break, (remember $.fn.live?), and we’d have to patch up app code and make sure other scripts would be compatible. After all, as far as dependencies went, jQuery was our dependency.

I’m sure we were all laughed at — laughed at by those whose languages had proper module and packaging systems, each with their own canonical package manager. RubyGems, pip and PyPi, and maven were all reasonably developed and their respective languages were built around the module systems they packaged for. JavaScript was a toy, relying on window and manually curated, precisely ordered script tags.

Then came Node. It implements (roughly) the CommonJS module format, using a synchronous require statement and a free exports variable to hang your module’s values off of. It maps modules and files one-to-one, with require first checking for a core module with a matching name, then its node_modules directory, looking up the filesystem for others if it can’t find what it needs. This is great: no longer do you enter dependency hell, where dependencies can require conflicting versions of another package. Instead, each module has its own subtree of dependencies.

Soon after, npm followed, automating the publishing and distribution of these modules. Things were great in Node-land. But it was Node, and browser JavaScript is different, right?

Bower: the savior of the frontend?

Bower certainly thought so, and showed up to the scene with a simpler model: offer a flattened dependency tree — this avoids the disk- and bandwidth-wasting subtrees for each dependency — and delivers modules in a truly unopinionated way. Created at Twitter, it uses git for its package delivery (also supporting tarballs and simple files available over http), and uses git tags as version markers. It downloads your dependencies, resolves and downloads those, and places them in a convenient place. After all, at the time, jQuery was hosted on a git repository and already checked in as the same form you’d use in your script tag. Bower was here to save us from manually managing dependencies with big download buttons. What could go wrong?

It turns out that using git like this is only convenient up to a certain point. Projects with build processes ended up checking in distributable scripts in a dist directory alongside the project’s source. Entire histories of projects are stored and cloned down with every install (bower has a rocky history with using git’s shallow cloning). At the same time, many projects began distributing consumables in a separate git repository, often ending in ‘dist’. This avoids many of these issues, but ultimately underscores the ridiculous extent to which version control is being abused as a distribution mechanism.

Bitbucket also experienced some of this pain when integrating twemoji, Twitter’s Creative Commons-licensed emoji set. We installed it as a Bower dependency, only to discover that during our build and deploy process, it downloaded a 342 megabyte git repository, filled with the project’s history and hundreds of original Adobe Illustrator copies of the emojis. We, as I can imagine most web products would, just needed the svgs and pngs of the emojis, which weigh in at about 15 megabytes. It turns out that as a git-cloning machine, its bower.json “ignore” field can’t prevent large sources from being ignored before publishing or downloading the package; instead, it simply deletes the matching files after the clone or download has completed (npm’s .npmignore ensures that files aren’t uploaded while publishing to the registry).

The flat directory structure, while ideal when conserving bytes, isn’t perfect either. In the case of packages which require conflicting versions of a dependency, one is bound to lose. In such a case, Bower will interactively prompt you to force a particular version of a dependency, risking breakage across your modules. Welcome to dependency hell.

Enter npm.

npm combines a public registry (with the ability to host your own, and even a supported enterprise version) with a great command-line interface. Once the pet project of Isaac Schlueter while he was maintaining Node at Joyent, it is now the product of npm Inc, a funded company with engineers dedicated to developing both the client and server as open-source projects.

Packages are stored in an npm registry as tarballs of distributable code, along with associated metadata declared in package.json. A module declares its dependencies in package.json, and a simple npm install will download its dependencies and their dependencies onward. An npm publish will run your package’s prepublish script, package up the tarball and send it to the registry.

Using npm to manage frontend components historically hasn’t been easy either. Even months before it received funding, npm was notoriously unreliable for a period of time as it suffered growing pains from its immense popularity. Moreover, being born out of Node, the CommonJS module format’s synchronous require conflicts with the asynchronous nature of the browser.

But then came browserify. It provides a build step combined with a tiny, tiny runtime implementation of Node’s require. It reimplements Node’s module resolution algorithm and bundles all of your app’s dependencies into a single JavaScript file (don’t worry — it can split them out), complete with sourcemaps to view the original sources. It even shims out many Node core modules, allowing a ton of npm’s 125K+ packages to be used in the browser. We now live in an age of build steps and as few script tags as we like.

But it must bundle tons of duplicate modules, right? npm’s many dependency subtrees could result in many, many copies of large libraries like jQuery being downloaded to satisfy all dependencies.

It turns out that Node developers figured this a while ago: they just don’t build large modules like jQuery. jQuery is an artifact of the lack of sane DOM APIs and extensive standard library, and duplicating jQuery on a page could have catastrophic consequences (not to mention the awkward architecture of plugins hanging off of the global $). By using smaller modules, this duplication becomes less and less of an issue. npm also ships with npm dedupe, allowing compatible modules to share dependencies. In fact, in the next release of npm, npm 3, deduping will be the default during the install process: modules will be lifted as far up the filesystem as possible without creating issues between conflicting modules. It’s truly the best of both worlds, and if you keep your modules small, you have nothing to worry about when it comes to bundle payload.

So why aren’t you all using npm? Back at Bonobos, my team bet our brands on it (with the help of browserify), and it turned out great. Granted, npm doesn’t solve all problems (there’s nothing quite as good as Node’s module system for CSS and other asset packaging, but Bower is explicitly unopinionated on these as well), but for the problems it does solve, it solves them really, really damn well. I wouldn’t have it any other way.

</script>