Could symlinks save us all?
The first thing that every developer will notice about npm is that it creates this weird node_modules directory in every single repository that you use it in.
This node_modules directory is strange coming from other packages managers that use global directories for storing packages. In fact it might seem like a flawed design because it duplicates packages across directories.
However, it is this localization of dependencies that encourages code sharing to an extreme. It’s why npm has to pay for storage of nearly 300,000 packages with over 4 billion downloads a month.
With so many packages a single developer might have dozens of repos for npm packages on their machine. Many of which depend on one another. These cross-dependencies introduce an annoying problem.
How do you develop on multiple packages with cross-dependencies?
The overly-simplified solution to this has been npm link. Basically, what npm link does is create a symlink between a package that you have on your machine and another package’s node_modules folder.
├── package-1 └── package-2 └── node_modules └── package-1 -> /package-1 (Symlink)
This works for simple dependency trees, however the symlink creates a rather annoying problem for more complex scenarios.
Imagine you have 3 packages: package-1, package-2, and package-3.
- package-2 depends on package-1
- package-3 depends on package-1 and package-2
├── package-1 ├── package-2 │ └── package-1 └── package-3 ├── package-1 └── package-2
Take note of how package-3/node_modules/package-2 does not have a sub-directory of node_modules/package-1. This is because package-2 and package-3 depend on compatible versions of package-1 so npm only installs one copy.
Now inside package-3 you want to npm link package-2:
├── package-1 ├── package-2 │ └── package-1 └── package-3 ├── package-1 └── package-2 -> /package-2 ╚══[package-1]
Now you have a problem: You have multiple copies of package-1 inside package-3.
Many times this is harmless and it won’t affect your development at all, which is why the node community has gotten as far as it has with this model.
However, as npm gets used more and more for front-end development this is creating lots of problems. Pulling in multiple copies of libraries in front-end development can cause things to completely break. Many libraries rely on the assumption that there will only ever be one copy of them. In the case of React (and possibly others) it explicitly does not allow for multiple copies to be loaded.
This causes an absolutely terrible development experience, you’re force to manually copy over changes from one repo into another repo’s node_modules directory.
We need a better solution.
A Promising Solution
A promising solution to this would be for npm to symlink everything to a global_node_modules directory.
Imagine what that would look like:
├── global_node_modules │ ├── firstname.lastname@example.org │ ├── email@example.com │ │ └── package-1 -> /firstname.lastname@example.org │ └── email@example.com │ ├── package-1 -> /firstname.lastname@example.org │ └── package-2 -> /email@example.com │ ├── package-1 ├── package-2 │ └── package-1 -> /firstname.lastname@example.org └── package-3 ├── package-1 -> /email@example.com └── package-2 -> /firstname.lastname@example.org
Notice that every package in every node_modules is symlinked through the global_node_modules directory, even the sub-dependencies within global_node_modules are symlinked to other packages within global_node_modules.
Now when you want to npm link a dependency it looks like this:
├── global_node_modules │ ├── email@example.com │ ├── firstname.lastname@example.org │ │ └── package-1 -> /email@example.com │ ├── package-2@local -> /package-2 │ └── firstname.lastname@example.org │ ├── package-1 -> /email@example.com │ └── package-2 -> /firstname.lastname@example.org │ ├── package-1 ├── package-2 │ └── package-1 -> /email@example.com └── package-3 ├── package-1 -> /firstname.lastname@example.org └── package-2 -> /global_node_modules/package-2@local
Here we’re creating a package-2@local symlink inside global_node_modules to the package-2 directory on our machine.
Then inside package-3 we are symlinking the package-2 dependency to global_node_modules/package-2@local instead of the non-local @1.0.0 dependency.
Now it’s important to note that the resolved dependency tree for package-3 looks like this:
package-3 └── node_modules ├── package-1 -> /email@example.com └── package-2 -> /global_node_modules/package-2@local ╚══[node_modules] ╚══[package-1 -> /firstname.lastname@example.org]
This may look like we have the same problem as before, however node’s require algorithm uses “realpaths” before loading any required files.
Basically to node, these dependencies look like:
- /email@example.com (not package-3/node_modules/package-1)- /global_node_modules/package-2@local (not package-3/node_modules/package-2)- /firstname.lastname@example.org (not package-3/node_modules/package-2/node_modules/package-1)
Rather than requiring /email@example.com twice, node will cache the result the first time it was loaded and reuse that.
This means that no matter how many times a firstname.lastname@example.org is required in our tree, it will only ever be loaded once.
Okay, but this breaks a ton of things right?
Yeah, this wouldn’t happen for free, and it’s not without its downsides.
For starters, tons of packages depend on the current structure of node_modules (arguably incorrectly so), so this would be a big breaking change in the community.
But even if all those packages got fixed, this approach still has its downsides. The big one being that this forces npm to be deterministic about dependency versions in ways that it didn’t need to be before.
To demonstrate this, lets come up with a new dependency tree.
Imagine we have 4 packages.
- package-2 depends on package-1 at either 1.0.0 or 2.0.0
- package-3 depends on package-1 at 1.0.0
- package-4 depends on package-1 at 2.0.0
├── package-1 ├── package-2 │ └── email@example.com–2.0.0 ├── package-3 │ ├── firstname.lastname@example.org │ └── package-2 └── package-4 ├── email@example.com └── package-2
Without the global symlinks this works fine. package-3 only has firstname.lastname@example.org in it’s tree because package-2 accepts that version. Same goes for package-4 with email@example.com.
However, with the global symlinks package-2 can only ever depend on one version of package-1 because it does this:
└── global_node_modules ├── firstname.lastname@example.org ├── email@example.com └── firstname.lastname@example.org └── package-1 -> /email@example.com
With the global symlinks, firstname.lastname@example.org will always depend on email@example.com even if a parent dependency (package-3 or package-4) wants something else (1.0.0 or 2.0.0).
So what is the solution?
I’m not sure that there is one. At least not without changing how node resolves dependencies, which would be worth discussing.
So you just wasted my time…
No! Or I hope not… I hope that this prompts people to think out of the box on how we can improve the “DX” (Developer Experience) of working with npm and node_modules without breaking the code-sharing side of things.
Follow me on twitter.