json ⊄ js
Conventional wisdom says that JSON is a subset of JavaScript.
According to the spec JSON strings can contain any unicode character except " or / or control character.
This means that the following string is perfectly valid JSON:
{"str": "own ed"}
Try copy and pasting that text into the console and assign it to a variable. Go on, we’ll wait.
Yeah, “SyntaxError: Unexpected token ILLEGAL”.
The problem comes down to two unicode characters that are considered line terminators in JavaScript: the line separator \u2028 and the paragraph separator \u2029. If we were to escape the string above it would be “own\u2028ed”.
So why is this a problem?
JSON is now widely used as a convenient serialization format, and while most situations don’t rely on JSON being a subset of JS there are a few of cases where it matters.
In JSONP (JSON with padding) the server writes the response data along with a callback that should be executed in the scope of the calling page:
handleResponse({"status": "ok", "id": 123456});
Some libraries implement an unsafe, but fast, JSON parse using “eval” for older browsers:
function unsafeParse(json) {
return eval("(" + json + ")");
}
And the other common usecase is to embed server generated globals in the page to avoid an extra server request:
var GLOBALS = {
"userid": 123456,
"twitterName": "dpup",
"role": "editor"
};
In each of these examples a line-separator character will break parsing, likely leaving you with a busted page.
I’m pretty sure this is a time bomb waiting to affect many sites.
What to do?
JSON is done. JSON will not be revised.
— Douglas Crockford, 2009
While the quote above is associated with a different discussion, the sentiment holds true for this issue. For better and for worse we’re stuck with JSON as it is.
One approach would be to escape all non-ascii characters using something like Closure’s escapeString method.
Or we can just handle the two characters that are a problem, and we end off with something like this:
function jsStringify(obj) {
return JSON.stringify(obj)
.replace(/\u2028/g, '\\u2028')
.replace(/\u2029/g, '\\u2029');
}
Email me when Joys of JavaScript publishes stories
