The Case For Checking-In Your Dependencies
Heroku wrote up some Habits of a Node Hacker, and they’re mostly on-point, but they wrote-off keeping your “generated” dependencies in source control. But what if I told you a better default is to check them in?
Most apps are composed of both necessary files and generated files. When using a source control system like git, you should avoid tracking anything that’s generated.
I’m not saying the smart folks at Heroku are clowns, or anyone else who brushes away their dependencies in .gitignore, but keeping your generated files inside the repo is underrated. If you try it, I’d bet there’s not many reasons you’d go back to the old way.
A series of incremental lessons over the past year changed my mind about this. Let’s walk through them.
This is a pretty well known reason to check-in dependencies. Most dependency tools and services (NPM, CocoaPods, RubyGems, etc) phone home to external infrastructure. It might be their own service, it might be GitHub, but it’s probably something out of your control by default.
Since it’s out of your control, you probably can’t debug it or fix it when it breaks. Your pull-requests or deploys will start to fail, the team will be upset, you will search Twitter and find similar frustrated developers. Thankfully most services are pretty good about this, but it can happen.
NPM encourages running a mirror or cache as an alternative to checking in your downloads. It’s great to have as an option, but yet more things to maintain for you. Environments like Travis might intelligently cache dependencies for you, but now you have yet another layer of indirection.
I don’t think this is a deal-breaker on its own, and I lived with it for years. But let’s dig further.
Semver Dependency Breakage
This mostly affects NPM and the Node ecosystem. By default, NPM doesn’t lock the entire dependency hierarchy (ala Gemfile.lock), which means fresh runs of npm install are highly non-deterministic. A dependency deep in the tree could be silently upgraded and break your app, which is a pain to debug.
NPM does offer a shrinkwrap capability, and coming from other ecosystems it’s been perplexing why it isn’t the default. It’s also extra tooling and context that a team has to understand, and most the time developers want things to Just Work out of the box.
So this can be a problem, but there are workarounds, maybe your team is okay with that…so is there another reason to check-in generated code?
Hidden External Dependencies
Some tools let dependencies run arbitrary code before they are installed. Sometimes this code connects to servers beyond the normal dependency service, which are like really out of your control.
Story time: Node SQLite talks to an external server (anything using node-pre-gyp has this ability). This past summer, their pre-compiled Linux build was quietly compiled using a newer version of glibc than our production and integration machines supported.
Our pull requests just started to fail, even if they had no changes from previously passing states. It was a massive pain to understand because this change was totally silent and deep in our dependency tree.
For all of these reasons, it just didn’t seem worth it to keep ignoring my dependencies.
There are some downsides, but they haven’t caused as much trouble:
- The git repo gets big (probably hundreds of megabytes bigger). Non-shallow git clones can be really big. I do a lot of work on mobile hotspots, so I feel the need for low bandwidth, but not sure it’s worth the other downsides.
- Diffs get a bit nasty when adding new dependencies or updating old ones. Most code review tools can hide those. There’s also up-side: if someone accidentally edits a dependency, it’s very obvious.
- You need to be careful about how compiled dependencies are installed and what platform you eventually run them on. For example, if your server is Linux but your work machine is OS X, you might quietly compile some dependencies and check them in for the wrong platform. You can get around this by doing all of your work on a VM identical to your production machine, which I think is a becoming a more common practice.
Like most engineering advice, this might not work for everyone, and there are downsides to both approaches. But give it five minutes and avoid the temptation to cargo cult.
Have other experiences with dependencies in version control? Leave a response or note :)