Understanding npm dependency resolution

In the front-end world everything is being used as a package and delivered in terms of packages. Packages depend on other packages and they express those dependencies in a specific type of notation called semantic versioning. So to manage these there are a strong need of an efficient package management system and there are many available like bower, yarn and npm. Among all the npm seems to won the competition we wont go into much detail why but its really efficient checkout this article for more details.

npm reads package.json file (which usually placed at the root directory) to install the dependencies and installed then under sub directory named “node_modules”. npm understand the semver syntax for the required version of the depended packages. Now these packages will be used by the source code and lies in the same context but how these are made available and how its been managed by the npm if multiple package depends on the same package or multiple package required the same package but different versions in both cases redundancy and packages size is the main concerns which should be considered while resolving these dependency and here the npm flourish its magic.

To install the packages “npm install” is the command that is used, it searches for the package.json and prepares a tree as per the dependencies mentioned in it and the dependencies of dependency and keep trying to make it as flat as possible by flat here means that lowest tree branches possible or the smallest tree depth because that means the smaller package size and more would be the efficiency of the code but this could not be achieve on the cost of compatibility right ? So lets understand how the npm manages this balance with the following example :

Dependency Resolution

I am taking the exact example provided in the npm documentation.

Note: From the v3 the dependency resolution algorithm is changed so this example applicable for v3 and above npm versions.

Lets consider the following example :

  • Module-A, depends on Module B v1.0.
  • Module-C, depends on Module B v2.0.

Note the sequence of modules mentioned because it plays a significant role in the dependency resolution.

So first is the module A in sequence and it depends on the module B v1.0, npm will install both module A and its dependency, module B, inside the /node_modules directory, flat.

Next in the sequence is module C which depends on again module B but with a different version. npm handles this by nesting the new, different, module B version dependency under the module that requires it

Now what happens if we install another module that depends on Module B v1.0? or Module B v2.0?

So lets say :

  • Module-D, depends on Module B v2.0.
  • Module-E, depends on Module B v1.0.

Because B v1.0 is already a top-level dependency, we cannot install B v2.0 as a top level dependency. Therefore Module B v2.0 is installed as a nested dependency of Module D, even though we already have a copy installed, nested beneath Module C. Module B v1.0 is already a top-level dependency, we do not need to duplicate and nest it. We simply install Module E and it shares Module B v1.0 with Module A.

Now the interesting part, what happens if we update Module A to v2.0, which depends on Module B v2.0, not Module B v1.0 ?

The key is to remember that install order matters.

Even though Module A was installed first (as v1.0) via our package.json, using npm install command means that Module A v2.0 is the last package installed.

As a result, npm does the following things when we install module A v2.0

  • It removes Module A v1.0.
  • It installs Modules A v2.0.
  • It leaves Module Bv1.0 because Module E v1.0 still depends on it.
  • It installs Module Bv2.0 as a nested dependency under Module A v2.0, since Module B v1.0 is already occupying the top level in the directory hierarchy.

Finally, let’s also update Module E to v2.0, which also depends on Module B v2.0 instead of Module B v1.0, just like the Module A update.

npm performs the following things:

  • It removes Module E v1.0.
  • It installs Module E v2.0.
  • It removes Module B v1.0 because nothing depends on it anymore.
  • It installs Module B v2.0 in the top level of the directory because there is no other version of Module B there.

Now, this is clearly not ideal. We have Module B v2.0 in nearly every directory. To get rid of duplication, we can run:

npm dedupe

This command resolves all of the packages dependencies on Module B v2.0 by redirecting them to the top level copy of Module B v2.0 and removes all the nested copies.

Conclusion

So the key take away from this example is that installation order matters and that can be ensured only by using npm command while adding or updating any package in the project. There may be possibility the generated dependency tree by npm would be different on different local development machine but it wont affect the behavior of your application because Even though the trees are different, both sufficiently install and point all your dependencies at all their dependencies, and so on, down the tree. You still have everything you need, it just happens to be in a different configuration.

if you want your node_modules directory to be the same than use npm install command, when used exclusively to install packages from apackage.json, will always produce the same tree. This is because install order from a package.json is always alphabetical. Same install order means that you will get the same tree.

You can reliably get the same dependency tree by removing your node_modulesdirectory and running npm install whenever you make a change to your package.json.

The scope of this post was limited to this only but there are more things like from where the npm installs the dependencies ? Is it possible to made npm to install from a specific source? What are peer dependencies and package-locks. I will cover all these in my next post soon.

Hope this was helpful ..!!

Keep Learning !! Keep Sharing !!