Yarn lockfiles and internal modules
I read an article on Yarn blog today, explaining how important it is to commit lockfiles in order to make sure that build behaves consistently and help avoid “works on my machine” problems. While I agree that it is a good practice to lock any open source package you use in your application, I believe the exact opposite is true for any module that is published internally within your organization.
Many organizations, when they get to the point they understand their application cannot be one big monolith, start breaking it into libraries and microservices. This is an important step in the evolution of any application, which will save your life and make your application scalable, but it does introduce a fair amount of complications.
On a monolithic application, doing continuous integration was easy. Your test suite runs on each push to master, you get immediate feedback from Travis, Jenkins or whatever CI system you use. If some bad commit breaks the build, you revert this commit, investigate the issue and push a bug fix. Life is good.
However, when your application is built from many different modules, each maybe maintained by a different team, and those modules can depend on other internal modules and on core modules and so on, then things start to get complicated. You still want all the goodness you had when everything was one big monolith. When you push to a core module, you want to make sure all of its’ dependencies still work, you want to get quick feedback from CI system. And if you want to push some bug fix, you want to be sure that all your users get it.
Node modules and continuous integration
When we started breaking our monolith at Wix we thought a lot about how we want this to work. We ended up with a pretty lenient approach where one git repo may contain one or more node modules, all of those node modules are published to an internal npm registry which is only available within the organization network. All services may install both internal node modules from internal registry and open source node modules from the public registry.
The big question was how to get continuous integration between all of those modules to keep working. The solution we came up with was both elegant and controversial. We decided that developers never publish new versions of their libraries manually. Instead, a new version of the library is published automatically after every build in CI system. Any dependent library or application build is then triggered and runs with the newly published version to make sure all tests pass.
This approach works very well for us, but it comes with many responsibilities:
- In order to keep the system sane, builds must be fast, stable and reliable. This is always true for CI, but when a single push can trigger dozens of dependent builds, it is even more critical.
- Every change must be backward compatible, you don’t want to surprise other teams by pushing a library with modified API. We do have deprecated API’s which people are supposed to migrate from in a timely manner, but when we finally remove them, there’s no surprise.
- We rarely decide to bump the major version of a library, this is something that is never done without a very serious discussion in which we did not find any better alternative. When we do bump major version, we have to go over dozens of modules and migrate to this new version, which takes a lot of time and effort (and obviously gives us very slow feedback about the new API, which means it might be done in multiple iterations).
- It is incredibly important for all developers to have a good integration test suite where they do not mock external dependencies, those tests are one of the only things that prevent this approach from going into complete chaos.
- Finally, and most importantly for this article — it is crucial that people do not depend on specific versions of internal modules and do not use lockfiles or shrinkwraps, because then we don’t get immediate feedback that something is wrong in latest library commit, which means bugs get to production environment and development iterations become slow and complex.
Continuous integration and version locking
When people lock version of an internal module, it is always a ticking bomb. For example, we push a bug fix to an internal module and only later discover it didn’t reach some applications because they had a lockfile.
When we finally realize that and we want to urgently deploy a fix, we find out it is too difficult to migrate to the latest version of our internal module since too many things have changed since the version was locked. The only way to fix the issue quickly is to create a branch from the locked version, fix the bug there and then publish a “hot fix” version and use it until we have time to migrate to the latest version (read never).
It doesn’t have to be a bug fix that triggers this nightmare. It can be a feature which some application needs urgently and was added after the internal module was locked. It can be some conflict catastrophe where two applications stop communicating properly because they use two very different versions of the same library.
For some this may sound like things that only happen in theory, but trust me, I’m not making this up, those are things that actually happened to us. And it doesn’t matter if we are talking about node application, or frontend applications, the same problems exist everywhere. Locking a version of an internal module will always get you into trouble.
Locking versions of open source modules makes a lot of sense. It is fine that as a developer of an open source library I don’t necessarily run my build with latest dependencies. It is also true that the advantages of having a consistent build supersedes the disadvantages of getting feedback about issues with latest dependencies from your users.
I would just add that as an open source developer, you should
yarn upgrade often in order to still be able to see issues first. Even if you have a lot of users, still the bug might be hiding in some corner not many users can see and still it takes quite a lot of time until some diligent user will do a
yarn upgrade, see the bug, understood it is related to your package, and take the time to report it.
While all of this is true, I believe it is still critical that people do not lock versions of their internal modules and will always use the latest code of those dependencies. Currently I’m not aware of a way to lock only part of your dependencies, so sadly I fallback to not locking anything. Like everything, it is a matter of tradeoffs: advantages of not locking internals modules supersedes disadvantages of not locking at all.
I’d love to hear ideas on how to create a lockfile only for open source dependencies and plan to look into solving this myself as well.