A lot of qualitative tools try to analyze projects with quantitative metrics like the number of contributors, number of stars on Github, number of downloads, etc. The QualityOfJSPackages report provides a different approach with much more quality-based metrics.
Quality has nothing to do with quantity!
All figures in this report are based on an analysis of the top 100 most popular packages (the result of the most depended-upon packages).
1. The average weight of JS packages
Every package has a weight that mainly corresponds to the size of this project on your disk when it’s installed. This weight can be broken up into 2 parts:
1. the size of all project files
The number of files does not directly correlate with the size on a disk. It’s better to have many small files than a few large files. However, it’s still an interesting metric when we analyze modules that should be as efficient and effective as possible (which is the case here because we are focusing on the most depended-upon packages). The fewer files a project has, the more likely it is to respect the single-responsibility principle.
There are 2 different metrics: size (on registry) and unpacked size. The first corresponds to the size of the module published on the NPM registry, so it’s minified and gzipped to reduce the download time. The second metric is the actual size on disk, once installed. Both are interesting to analyze, but the most important point is the difference between the two metrics. As we can see, once installed, the size of a module can be substantially bigger than the size given by NPM. And these 2 metrics can become really important when we look at the 90th percentile (or higher). The most worrying point is that it doesn’t take into account the size of the dependencies …
2. the size of all module dependencies
Here we are, the black hole of the Node.js ecosystem; the
node_modulesfolder which contains dependencies and all sub-dependencies of a project.
If we look at the third quartile, for example, we can see that only 6 dependencies will lead to an average of 28 sub-dependencies, and an average size of 2.18MB. So every dependency should be chosen carefully and should be analyzed in terms of performance.
If you are wondering how the average (blue column) can be so high, it’s because the 99th percentile (so the worst of all) is really high. I will not throw anyone under the bus by naming names here, but for this project, the value is around 300MB.
2. The hidden cost of JS dependencies
Dependencies can be seen as a way to make your code lighter. This is partially true, but at the same time, all dependencies (and their sub dependencies) will greatly burden your pages and projects.
There is no magic behind external dependencies, each of them will increase your project’s size.
Some metrics to consider when we want to keep dependencies under control are :
- the number of direct dependencies and number of total dependencies
- the dependency tree’s depth
- the total weight of dependencies (basically the size of node_modules folder)
Every dependency is installed in order to be imported sooner or later by your program. Every module has a different loading time, and sometimes one project has to load more than 10 dependencies to be able to start. The boot time of your project can be slowed down by accumulating too many dependencies that can take a long time to load.
On this chart, we can see the average loading time for each module. The 90th percentile is very interesting as it shows a huge difference from the median value. It means that your server (or app) startup time depends enormously not only on the number of dependencies that you use, but it also correlates with their “quality”. Most of them have a loading time of around 3ms, which is almost instant, but using the worst of them can quickly lead to a total loading time of more than 1 second. The difference between instant loading time and a few seconds can completely change the behavior of your deployments, scaling system, etc.
For frontend apps, this is even more catastrophic as every second added to your loading time will decrease user satisfaction.
An extra 0.5s in each search page generation would cause traffic to drop by 20% —Google
Another big concern is security. This is a given, as every dependency will bring its own security leaks.
The state of security is not that bad (number of security vulnerabilities per project)…. but that’s not the case for dev dependencies. Of course you are not supposed to install/use them on production, but errors can happen so fast.
Keeping your own dependencies up to date has a lot of advantages, such as security patches, performance optimization, etc. So it’s interesting to know if well-known libraries try to do the same on their own codebase.
This chart only shows the number of direct dependencies that are not up to date. We can reasonably think that the number would be much higher if we had taken all items in the dependency tree into account.
One of the main focuses of software quality is code duplication. Duplicated code can lead to a lot of issues:
- Decrease in maintainability
- Decrease in code readability
- Increase in security risks
- Increase in codebase size
The average code duplication of a package is a good metric when evaluating code quality.
As we can see, many packages still have a lot of duplicated code. A lot of quality issues can be solved by keeping your codebase as simple and clean as possible. It seems that the most popular packages are, once again, not equal in terms of code duplication. Some of them have a really healthy code base, but some others seem to have significant technical debt.
7. Exact version of dependencies
Another point is to use the exact version of dependencies to ensure that the behavior of your application will be the same after each deployment. Conversely, the range version can introduce weird behavior and unexpected results.
You have 2 possibilities on how to handle this:
- pin your dependency version
- use a lock file for dependencies (like
8. Global quality score of JS modules
Now that we have collected several metrics, we can calculate a global quality score for every module we analyzed. Every metric can be considered good or bad according to a threshold. For example, code duplication is only considered good if you have 0% of duplicated code. Of course all the thresholds can be considered subjective, but it’s still interesting to analyze the result.
And not a lot of modules have more than 80%. The average and the median are in fact really close, around 55%. What does that mean?
There are a lot of possible improvements, even for well-known frameworks and modules.
Of course the rating system should be improved and refined in order to increase accuracy. But the trend is still there and quality should become a higher priority.
If you want to see all the thresholds used to generate this report you can refer to this project: https://github.com/wallet77/qualscan or simply run the command
qualscan -h to output default values.
9. Impact on the environment
And what about our impact on the environment? Everything we do in our industry is virtual, so there is a common mistake to think that software engineering pollutes less. Technically that’s not true as we are using more and more power to run simple commands like
npm install, for example. Just imagine what happens when you download, then unzip, then install every dependency. Just take a look at the result, and keep in mind that the average consumption on this machine, when it is idling, is around 0.3W.
We went from 0 to 10W on average, and throughout the entire installation process, which can take time depending on the number of dependencies. And as you can see, this command alone represents 75% of the consumption of the host.
This impact should be multiplied by the number of installations whether made by devs (manually) or by CIs (automatically) every day.
There is no magic, everything we do has an impact on the environment
So any optimization can benefit the planet, let’s not underestimate this point ;)
Set of useful tools to analyze your projects
- Qualscan: find quality issues
- JSCPD: find code duplication
- NPM graph: display your dependency tree
- Bundle phobia: analyze module’s size
- Snyk advisor: analyze module’s quality and security
- npms.io: analyze module’s quality
Quality is a vast subject and we’re yet to touch upon every aspect. The report itself should be improved, but the trends are not nearly as positive as I suspected before starting this project.
There are nonetheless some good surprises, especially around security and regular updates.
Keep in mind that all modules scanned are open-source projects which need a lot of work in order to stay on the right track. Do not hesitate to help development teams, they always need some great minds to detect and fix issues, especially to improve quality!
Quality of JS packages
Once you start using a tool/service like Renovate, probably the biggest decision you need to make is whether to "pin"…