Keeping the Node.js core small

This story was originally published on Node.js @ IBM by Sam Roberts.

“Why do you hate my use-case?”

Features are wonderful. When Node.js adds a new API, we can instantly do more with it.

Wouldn’t a larger standard library be more useful for developers? Who could possibly object to Node.js getting better? And who, even more strangely, would actually remove APIs, making Node.js objectively worse?

Turns out, lots of people:

A recent proposal to get HTTP proxy environment variable support into Node.js, got a generally hesitant response from Node.js maintainers, and generally enthusiastic response from Node.js users.

What’s going on here? Why the disconnect? Is this an ideological stance?

No, for those who support a small core, its a matter of pragmatism. What is happening is that the instantaneous delight of new APIs in Node.js fades with time, even as the problems of maintaining those APIs grows. The longer you work with Node.js core, supporting, bug fixing, and documenting the existing APIs, the more you wish that Node.js had been more conservative in accepting new APIs in the past, and that those APIs had been implemented as npm modules that could be independently maintained and versioned.

Following are some war-stories to give a sense of why so many long-time Node.js contributors are (seemingly perversely) trying to keep useful APIs out of Node.js.

“But the code is already there!”

It is particularly tempting to add to Node.js’ public API functionality that Node.js already has. Node.js needs it internally so we already need to maintain it, what could go wrong?

APIs published as npm packages can be versioned, but when you get Node.js, you get all the APIs in that Node.js version as one bundle, and you have no choice but to accept them. This makes it very hard for us to change them, and this is particularly annoying with APIs that were initially implemented for internal use, and that we should be free to improve.

Take util.deprecate as an example. util.deprecate() has been an API since 0.8, note that this predates the existence of Node.js’ internal module system, which allows private APIs only usable by Node.js.

Node.js uses this API, it’s tied into various CLI options (--no-deprecation, etc.) so is not about to go away, and exposing it to users doesn’t seem to cause any harm. However, we now have feedback from users that the deprecation messages aren’t very helpful.

There is an easy fix to this, we can add to the deprecation messages a link to the list of deprecated APIs where we can describe why the API was deprecated, and what to use instead.

Except, its not that easy. Such a change is backwards incompatible and would make the deprecate function work substantially worse for users outside of Node.js itself.

We can move forward. Perhaps we can create a new internal-only deprecation API which Node.js can use and improve, and we can eventually deprecate util.deprecate() and encourage the use of npm alternatives, like depd. To get more uniform behaviour across out-of-core deprecation APIs, we can even expose the Node.js CLI flag values for --no-deprecation and the rest. This will take much time and argument, if it happens at all, sucking energy that could be spent on improving Node.js’ core APIs.

If the deprecation API had never been public, it would probably have been reworked within days of the suggestion to include URLs to the deprecation notice.

“But its only a tiny change and I don’t want to make a new npm module!”

fs.writeFile and readFile is another example of what seemed like a simple feature request, to add the ability to write to an already open numeric fd, in addition to the current behaviour of writing to a string file path.

It turns out that after it got added to the API, we can’t agree on how it should work: should the file position be rewound to the beginning or not? If you don’t rewind, you won’t read the whole file. If you do rewind, it won’t work if the fd is a pipe.

Why is such a seemingly trivial API addition causing such churn? Because fd support should not have been added, and writeFile() and readFile() should not be Node.js core APIs.

Reading or writing entire files is a feature that is built on top of the basic file APIs. Both whole file APIs predate npm itself, but nowadays, they could perfectly reasonably be implemented as a npm module, just as recursive directory removal was. As a semantically versioned npm module, a new API could be easily added, or an optional mode could be introduced for an existing API, or an entirely new module could be published that was specifically useful for dealing with raw file descriptors. Users could choose their APIs, and Node.js core’s issue maintainers would not be dragged into deciding what users really want when they read or write from file descriptors.

“But it’s needed for an ecosystem to form!”

Clearly, Node.js should have some features! There are things that can only be practically implemented in Node.js, and commonly used functionality like crypto and HTTP that need to be implemented in C++ are useful features of core (C++ addons are possible, but can be quite painful to install from npmjs.org).

Sometimes, core can implement basic mechanisms, creating a standard on which an inter-operable ecosystem of features can grow. However, ecosystem growth can be a difficult thing to predict. Mikeal Rogers discusses some success stories in his humorously titled Make Node.js Core Bigger but he didn’t mention some fairly prominent counter examples, including his own request module.

Node.js’ HTTP support implements a wide set of both basic and advanced HTTP protocol features, such as chunked encoding and protocol upgrade, but avoids offering a high-level “user friendly” API, which has forced an ecosystem of higher level APIs to develop. Some of the missing features are incredibly useful, such as retrieving an HTTP resource as a single string. I can’t count the number of times I’ve reviewed code like that shown in the http.get()example, that manually builds up a response one chunk at a time, and had to point people to Mikeal’s module which does this more simply, elegantly, and maintainably.

Recently, the WhatWG has defined an HTTP client API, and new implementations such as fetch are gaining momentum. Luckily, Node.js did not ship with a high-level client API that would now be seen as “non standard”, and the ecosystem is thriving. This is a success story, driven by Node.js’ API not being made bigger.

Think how much better the Node.js HTTP API would be if it didn’t include an Agent, either. The Agent could have been implemented as a npm module cleanly layering on top of core’s HTTP API, but instead it is injected into Node.js core, in a way described as “mind boggling” in a recent attempt to document it.

Streams is less of a success story. It turns out Node.js has gone through 3 subtly different and incompatible versions of the API that has resulted in an API so complex that stream experts like Max Ogden recommend beginners not learn it directly. Instead, you should only ever use readable-stream, or the ecosystem of modules built on it. Even then, there is great unhappiness among streams users, about error propagation in particular.

Its awkward that while Node.js contains streams, it doesn’t have a library of streams, so can’t use them in its examples. Even if you avoid request (why?), this example should be written with a stream like bl or concat-stream.

The existence of Node.js streams makes using anything but them a difficult proposition for a library (because it would be “non standard”) and at the same time their existence in Node.js core makes them difficult to update and improve. I don’t expect it to happen, but I personally believe Node.js would be better if the core API was callback-based in a way that supported back-pressure, and if the streams API was maintained out of core. We could then iterate on and improve a semantically versioned and npm installable streams implementation that would work seamlessly across all Node.js LTS versions, and have a library of curated streams utility modules, similarly to how express and its middleware is managed.

“What now?”

Hard to say. Features keep landing, new APIs appear, we struggle to support existing APIs (and sometimes just deprecate them, sorry domains). Node.js is clearly successful despite all this, but the debates will continue.

My hope is that people will look more critically at new feature proposals, and also consider how many of their own personal annoyances with the Node.js API (you know you have them!) would have been easily fixable if the API had been published as a well-managed npm package.