Securing the JS Ecosystem

I wonder if this guy is a contributor to any node_modules in my project. (Image credit: Netflix, Black Mirror s3e3.)

Note: these are my personal opinions and not those of Netflix.

The event-stream incident, eslint-scope incident, and David Gilbertson’s I’m harvesting credit card numbers from your site reveal fundamental weakness in the JS package ecosystem. Because of JS’ limited standard library and other cultural reasons, modern JS apps have massive dependency trees, placing a lot of trust in third parties. We’re now learning that this trust may be misplaced, and we may be recklessly endangering our users, business, and reputations.

To be clear, I’m a huge proponent of the JS ecosystem. Like capitalism, I believe it’s “the worst [package management ecosystem], except all the others.” I’ll be the first to plead guilty to blithely installing a third party dependency to save myself from writing another fifteen lines of code. I’ve vociferously argued against an expanded standard library over the years, loving the flexibility and rapid innovation that a minimalist language core provides. But it’s clear that things need to change. If an attacker has remote code execution in your system, they have vast power against you. And in the current ecosystem, gaining remote code execution in a wide variety of popular projects is almost comically easy.

(If you’d like to see who has the power to run code on your machine when you run yarn install, check out a tool I made: list-maintainers. Fun fact: a newly instantiated create-react-app project has 549 maintainers in its node_modules tree.)

Expecting developers to read every line of every dependency they import (including transitive dependencies) is impractical. When time is tight, as it often is, even the most fastidious among us will likely not do much more than skim. And as we saw with flatmap-stream, the lines of code needed to implement an attack can be subtle and may be hard to distinguish within a hematoma of boilerplate. (And if the code is minified, you can forget about any sort of meaningful analysis by most engineers.)

By our nature, humans are often not motivated to protect against low-likelihood but high-impact risks. The vast majority of npm installs do not give a malicious attacker remote code execution on your system, so you can be forgiven for doing it without remorse or hesitation.

And finally, junior engineers may not be qualified to evaluate a package’s security implications. It would massively raise the barrier to entry of the JS ecosystem if npm were an advanced tool (because juniors couldn’t use it safely) as opposed to something you run in your first tutorial.

So, I won’t call for an ineffective morality campaign where people are encouraged to abstain from installing most npm modules or apply extreme vetting to their entire dependency tree. Instead, I’ll make recommendations that fit better with human nature, and center around reducing the amount of trust we need, and providing a better basis for the trust we do give.

Some of these ideas call for npm, Inc to take a more interventionist approach to managing the community. I like the benefits of a decentralized model, but the reality of situations like event-stream and left-pad show that people can’t always be trusted to make good decisions, and benevolent overlords can be helpful. If these changes put too much burden on npm, Inc then it may be reasonable to ask companies who benefit from the JS ecosystem to contribute time or money to enact these policies and keep the ecosystem healthy.

If npm, Inc does not take a more forceful stance, we may see the JS ecosystem balkanize into a set of internal registries run by big companies who are able to apply more safeguards. This would not be good for anyone.

Security is a tradeoff between safety and convenience. Many of my proposals will make things harder, at least in the short term. But these tradeoffs are worth considering. If we don’t have security, we can’t have an ecosystem.

Reduce the amount of trust we need

Many of these ideas may be more complicated to implement than they appear on the surface. A detailed spec for all of them is outside the scope of this post.

Sandboxing and Fine-Grained Permissions

Left-pad, which I’m happy to see still gets 1.9M downloads/week, does not need access to the file system or network requests to fulfill its stated purpose. What if we had an OAuth-style permissions granting process, analogous to how you can give an app access to your Facebook avatar but not your history of liking old classmates’ photos at 2am? For instance, we could grant left-pad access to execute its own code, without allowing it to access the file system.

The syntax for safely importing load-json-file could look something like:

import loadJsonFile from 'load-json-file' using('fs');

This would grant load-json-file access to require('fs'), but not make network requests, for instance. And this constraint would apply to the entire dependency subtree of which load-json-file is the root.

If this is too hard to implement, then with the benefit of statically-analyzable import statements, a tool could at least reveal what imports a package will make. For instance, if I see that load-json-file is importing the os module, I may be suspicious.

Additionally, post-install scripts are an attack vector. You can’t even npm install a package to inspect its contents without providing it the ability to execute arbitrary code on your machine. Today, any package can post install, but perhaps it should be limited to explicitly permitted packages. If left-pad needs to run make, for instance, then I’m just going to bust out my Art of Computer Programming and write the functionality myself.

Permitting a post-install step could be declared in the root-level package.json:

"allowPostInstall": ["node-sass"]

This would have the added benefit of discouraging developers from doing compilation as a post-install step, which is an anti-pattern.

Make it Easier to Inspect Packages

If someone did feel extra cautious and wanted to inspect a package’s source before reading it, they would likely go to GitHub. But of course, there’s no guarantee that what is in GitHub matches what’s in npm. And the developer can’t install the package locally to inspect it without allowing potentially malicious post-install scripts to run.

A savvy developer would know they could find the tarball for the version of the package they want, and untar it manually. (Even this may expose the user to exploits — I am not sure.)

To reduce the friction associated with safely inspecting a package, npm.com could provide UNPKG-style functionality (or perhaps just link to UNPKG directly). It would also be nice to highlight the main file, so packages can’t have a red-herring index.js with a malicious other main file hiding elsewhere.

People often assume that what’s in GH is equivalent to what’s in npm. Instead of shaming people for thinking this way, we could reward packages that maintain identical code in GH and npm. npm could have some sort of verification badge for packages that pass this check, and developers could use that as a marker of trustworthiness.

JS engineers hard at work creating node modules. (Image credit: Netflix, Black Mirror s1e2.)

Provide better basis for the trust we give

This section proposes surfacing package quality and trustworthiness feedback to users via the npm web UI. I’m envisioning something like the badges that maintainers currently add on their own:

The npm trustworthiness indicators would have a similar mental model, but would be provided by the platform instead of users.

Guide Developers to Safer Behavior

On web and CLI, npm can do more to guide users towards the “pit of success” of picking safe packages. In response to the event-stream incident, Zach Schneider writes:

Think before you install. This isn’t a panacea — as demonstrated above, it’s easy for attackers to slip malicious code into minified output, which is hard to find even if you knew it was there. But you will drastically reduce your exposure if you stick to popular, well-maintained packages. Before you install a new dependency, first, ask yourself if you really need it. If you already know how to write the code, and it won’t take you more than a few dozen lines, just do it yourself. If you do need the dependency, scope it out before you install. Does it have high download numbers on npm? Does the GitHub repo appear well-maintained and active? Has the package been updated recently?

Instead of relying on users to think about this themselves, npm can surface “trustworthiness” information in the UI. npm could release a tool that runs against a dependency tree and flags suspicious packages. Or it could output a aggregate score for the entire tree, which could be checked as part of the code review process for a project. If a developer sees that a PR drastically reduces the dependency trust score, it could prompt them to investigate further. The score wouldn’t have to be perfect; the developer could often add a flag to say “I’m fine with these packages” and moves on. But it would make it harder to slip random packages in to popular projects.

Minified code could be explicitly discouraged in published packages. Most of the time, the root level app will do its own minification, so there’s no point to a library minifying itself as well. Minified code makes it easier to hide malicious logic.

The trustworthiness score could be computed based on a variety of metrics, including the ones mentioned by Zach and the others mentioned in this post. Open question: is it better to attempt to produce an aggregate score, or to present users with the individual information (last update on GH, downloads/week, etc) and let them decide for themselves how to weigh each factor?

Benevolent developers whose packages are wrongly flagged as suspicious may protest a trustworthiness score, particularly if npm aggressively steers users towards more trustworthy packages. Like Google and Facebook’s ranking algorithms, there would be anxiety over the winners and losers of a proposed change. These are regretful situations, but I think it’s a tradeoff worth considering for enhanced security. Having a central authority make judgements about content quality will always have drawbacks, but ecosystems like Twitter and Facebook have found that an unpoliced environment becomes unhealthy.

Many of the signals around trustworthiness and quality are already accessible to npm, Inc. For instance, they use them to rank npm search results. I recommend surfacing those signals via CLI and web to nudge users towards higher-quality packages.

Prevent Similar Name Squatting

lodash gets 17.8M downloads/week. My new alternative that definitely doesn’t harvest your GH auth tokens, lobash, currently gets 0, but with some fat fingers, that could change! (Update: I’m now up to 26.) I probably shouldn’t have been allowed to publish a module that has such a close name to such a popular module. If npm prohibited this, it would follow other practices, like Chrome warning you when you go to gmial.com:

npm already has some measures against typos and typo-squatting, but could expand their coverage.

Alternatively, instead of a publish-time gate, npm could warn users on the CLI and web at install time. If you install lobash instead of lodash, the CLI could ask you confirm that you meant to install the less popular package. And when you view the package profile on the web, the UI could indicate that there is a very similarly named, much more popular package.

Oversee Adding New Maintainers

For the event-stream incident to happen, the original maintainer had to hand off control of his popular node module, event-stream, to an unknown third party who just happened to ask for it without any prior involvement in the project or community recognition.

Maybe, when a package is above a certain popularity threshold, maintainers should not be allowed to unilaterally add new maintainers. Diminishing maintainer autonomy would be controversial, but I would argue the ecosystem is stronger because npm reduced maintainer autonomy in the wake of the left-pad incident. (You no longer have ability to unpublish your own packages without convincing npm, Inc that it’s a good idea.)

I don’t blame event-stream’s original maintainer for neglecting it; he wasn’t being paid, so it’s purely up to him to decide to work on it. I have several neglected open-source projects myself. What we need is someone in the loop who is paid to care about projects that are highly important to the community. This person would probably work at npm, Inc. Their salary may be paid, at least in part, by the companies that want a secure and reliable JS ecosystem.

Improve Visibility Around Owner Changes

I use the load-json-file module to read sensitive JSON data. The reason I feel safe doing this isn’t because I’ve read every line in its dependency tree. I feel safe because I trust Sindre Sorhus. If Sindre Sorhus were to hand control of the project off to someone else, I would want to know about it, instead of it being hidden behind a patch bump.

One way to do this would be requiring a major version bump for a new publisher. The npm CLI could report who the new publisher is, and I could take a minute to see if they seem legit. This requirement would obviously be an impediment to new people joining projects. However, the friction could be reduced if more projects move to a model of having GitHub push access being more freely distributed than npm publish access.

This would have some tricky edge cases. For instance, if a new maintainer joins a project and publishes version 4, what happens when a patch to 3.x needs to be released? Is the new maintainer now allowed to publish in the 3 series, or can the only publish in 4 and above? If all maintainers who were around for the 3 series leave the project, does that mean no more 3 series releases can occur? Perhaps the rules are more stringent for projects above a certain popularity level.

Support Users Verification

Twitter improves community trust by having a central mechanism to verify identity for publicly interesting people:

npm could support similar functionality. For instance, lodash is maintained primarily by John-David Dalton. There is a GitHub and an npm profile bearing his name and picture; are they actually controlled by the same person? Does he actually work at Microsoft? How long has he had an npm account for? Authoritative answers to these questions would make it easier to decide whether to trust an author. Keybase has done work around verifying owners of public-facing accounts which may be useful here.

Verification information, or some broader indicator of author trustworthiness, could be surfaced in the npm CLI as well. (Author trustworthiness could be computed by markers like how long the account has been active and how many popular packages they have publish rights to.) If every package in my dependency tree is published by someone who verifiably works at a reputable tech company, for instance, then I feel reasonably safe.

This change could introduce a catch-22 which may discourage newcomers: no one would want to use your packages until you’re already well-known, and you can’t get well-known until people use your packages. But a dangerous ecosystem with insufficient checks on bad actors also discourages newcomers, who lack the context to make informed choices about who to trust.

npm could provide different levels of verification, all of which feed into signals about package trustworthiness. For instance, verifiably working at a legit tech company (which could be very broadly defined) is a good sign. So is having 2FA enabled. Bespoke manual identity verification could be reserved for the most influential open source contributors.

Develop a List of Trusted Packages

I’ve long enjoyed and benefited from the rapid innovation enabled by the JS decentralized model. However, these security incidents invite us to revisit the tradeoffs we’ve made thus far.

A fundamental problem is not knowing who to trust, and a tragedy-of-the-commons lack of investment in core infrastructure.

The decentralized model can also be a barrier to entry for the community: how would a newcomer know what’s considered the “best request library these days”? If a central authority designated certain packages as preferred, then the question would be easy to answer.

The process might look something like:

  1. Packages could submit themselves for approval, or the central authority could proactively vet packages that are popular enough.
  2. The central authority does a line-by-line analysis of the package and its dependencies.
  3. Once the package meets a well-documented and publicly-available set of criteria, the version of the package that was vetted is marked as approved.

Once a package is approved, that information would be surfaced in the npm website and CLI. (Perhaps approved packages would be ranked higher in npm search.) Of course, packages that lack central approval will continue to be available. But developers who want to be conservative could stick to approved packages. This could drive adoption in risk-averse environments, like the government.

The central authority would be staffed by knowledgeable JS engineers, paid for by npm, Inc and companies which benefit from a secure JS ecosystem. This would be a similar model to other cases of companies taking stewardship of language ecosystems they rely on, like those whose employees use business hours to sit on TC-39.

JS is a massive ecosystem, with many new packages being published every day. Vetting all of them would be impractical. But there could be a good ROI on just vetting the most popular n packages.

Some people may read this and have a knee-jerk reaction to what sounds like a very enterprisey process. But the key pain with a bureaucratic process is the inability to bypass it. I’m not suggesting anything that will stop anyone from what they’re doing today; I’m just proposing adding new information so individuals can have more meaningfully informed consent.

Conclusion

As JS developers, if we want to earn our users’ trust, we need a higher standard of safety in our community. Many of these proposals will not be simple or cheap to implement, but the improved security will pay dividends over the long term. As many social media sites have painfully learned, being reactive on security and safety can permanently lose community trust. Playing catch-up on ecosystem integrity is a hard game to win.

With both eslint-scope and event-stream, the only reason the attackers were caught is that they made JS errors. Imagine what attacks may still be in progress that we don’t know about yet because their authors are better at JS.


If you enjoyed this post, please clap for it so others may find it as well! Thanks to Tony Casparro for reading drafts of this.

Appendix / See Also