2 min read
Next in trending

json ⊄ js

Conventional wisdom says that JSON is a subset of JavaScript.


Conventional wisdom says that JSON is a subset of JavaScript.

The thing is it isn’t.

According to the spec JSON strings can contain any unicode character except " or / or control character.

This means that the following string is perfectly valid JSON:

{"str": "own
ed"}

Try copy and pasting that text into the console and assign it to a variable. Go on, we’ll wait.

Yeah, “SyntaxError: Unexpected token ILLEGAL”.

The problem comes down to two unicode characters that are considered line terminators in JavaScript: the line separator \u2028 and the paragraph separator \u2029. If we were to escape the string above it would be “own\u2028ed”.

So why is this a problem?

JSON is now widely used as a convenient serialization format, and while most situations don’t rely on JSON being a subset of JS there are a few of cases where it matters.

In JSONP (JSON with padding) the server writes the response data along with a callback that should be executed in the scope of the calling page:

handleResponse({"status": "ok", "id": 123456});

Some libraries implement an unsafe, but fast, JSON parse using “eval” for older browsers:

function unsafeParse(json) {
return eval("(" + json + ")");
}

And the other common usecase is to embed server generated globals in the page to avoid an extra server request:

var GLOBALS = {
"userid": 123456,
"twitterName": "dpup",
"role": "editor"
};

In each of these examples a line-separator character will break parsing, likely leaving you with a busted page.

I’m pretty sure this is a time bomb waiting to affect many sites.

What to do?

JSON is done. JSON will not be revised.
— Douglas Crockford, 2009

While the quote above is associated with a different discussion, the sentiment holds true for this issue. For better and for worse we’re stuck with JSON as it is.

So we need a work around.

One approach would be to escape all non-ascii characters using something like Closure’s escapeString method.

Or we can just handle the two characters that are a problem, and we end off with something like this:

function jsStringify(obj) {
return JSON.stringify(obj)
.replace(/\u2028/g, '\\u2028')
.replace(/\u2029/g, '\\u2029');
}

Joy.