The Tradeoff of Multiple Repositories
More often than I expect, I come across software projects that consist of multiple source control repositories. The reasons vary. Perhaps it’s thought that the web frontend and backend aren’t tightly coupled and don’t need to be in the same repository. Perhaps there’s code that’s meant to be used throughout an entire organization. Regardless, there are real costs involved in the decision to have a development team work in distinct, yet related, repositories. I believe these costs are always overlooked.
Double (or n Times) the Gruntwork
The most obvious cost involved is additional gruntwork. Let’s imagine a project with a mobile app and web service, each having its own Git repository. When it’s time to start a new feature, the feature branch will need to be created twice. When the work is finished, two pull requests will need to be made. When it’s appropriate to make a commit, it might need to be done twice. When it’s time to push, it might need to be done twice. To help manage all of this, an extra terminal might be appropriate.
Individually, none of these costs is very significant. Collectively, they represent a moderate inconvenience and cognitive burden. I’ve seen developers weigh this and decide it’s worth the cost, because they are trying to achieve some other ideal.
Ultimately, these inconveniences are just symptoms of a more fundamental — and easily overlooked — tradeoff.
Context: Not Version-Controlled
A repository is essentially a set of snapshots in time. For any commit, it’s easy to see not only what changes were made, but also precisely what other files existed and contained at that point in time. This is pretty obvious, after all. It’s one of the biggest selling points of version control.
With a project consisting of one single repository, that snapshot encapsulates everything there is to know about the source code. Once there are multiple repositories involved in a single project, this context is fragmented.
This fragmentation manifests in various ways. Let’s look at some examples:
- When moving code between repositories, neither one has knowledge of the other. Information about where the code came from or went is lost.
- If your frontend branch repo depends on your server to be running with a corresponding branch, there’s no native or reasonable way to express that relationship. Information is lost.
The Real Tradeoff of Multiple Repositories
Breaking a project into multiple repositories involves a fundamental tradeoff. By doing so, information about the broader context of the application is pushed entirely outside of version control.
Although it’s possible to work to counteract this, for example, by establishing team practices, using Git submodules, or building custom machinery, it will require work. That’s work spent to regain what you get for free by using a single repository.
Therefore, the most likely place that this information will move is into the culture and individual minds of the team. This is a much more ephemeral and unreliable place than a source repository. It makes it harder to onboard new developers and coordinate things like continuous integration.
It’s up to your unique situation whether it’s a win or loss to split your code into multiple repositories, but the costs are both real and easily overlooked. I’d strongly suggest weighing these tradeoffs thoughtfully. And, if you find yourself on a project where these costs are bringing you down, I’ve written a blog post on how to super-collide your repositories together.
Originally published at spin.atomicobject.com on August 22, 2016.