Preserving JSON object keys order, in JavaScript, Python, and Go language

JSON as the data exchange format is universal everywhere

Tomas • Y • C
8 min readJan 30, 2018

--

Today if considering data exchange format, comparing JSON to the alternatives like XML, CSV, we could probably say JSON is the most universal format, JSON support is available in almost every programming language

Handle JSON in JavaScript, Python, and Go language

(why these 3 languages? wow, just because all the 3 are popular today and all my favorite languages, I use all of them in different projects)

in JavaScript it’s the most easy, just because JSON is native valid JavaScript syntax, can copy a JSON object into a JavaScript source code and assign to a variable, it just works, even no need of parsing while, however actually the JavaScript source code need parsing by the interpreter, either in a real browser, or in Nodejs in server side, it’s implicit; if the JSON input come from an external API call and loaded as string, like this:

var data = `{
"country" : "United States",
"countryCode" : "US",
"region" : "CA",
"regionName" : "California",
"city" : "Mountain View",
"zip" : "94043",
"lat" : 37.4192,
"lon" : -122.0574,
"timezone" : "America/Los_Angeles",
"isp" : "Google Cloud",
"org" : "Google Cloud",
"as" : "AS15169 Google Inc.",
"mobile" : true,
"proxy" : false,
"query" : "35.192.xx.xxx"
}`

need to parse to a JSON object, then in JavaScript it’s just a call of

var obj = JSON.parse(data);

the best part about isJSON is globally available doesn’t even need to import

Python has standard library support of JSON handling, so if you have a string of JSON input, the parsing is also easy:

import jsonobj = json.loads(data)

Go is a compiled, statically-typed language; its compilation can generate some more efficient machine code than dynamic languages above, execution speed is close to C/C++, but let’s see how easy is to handle JSON

import "encoding/json"var obj map[string]interface{}
err := json.Unmarshal(data, &obj)
// optionally, check err
// then the obj has the JSON object, can access the key's values like obj["zip"] obj["mobile"] ...

In a true environment when handling JSON on daily basis, there are more valid JSON types than the object type: like array as the outer container, or just a literal bool, number, or string; here uses JSON object only as an example because object is still most often used containing type, and relate to the problem I want to talk today.

Error handling is by different ways in these 3 languages, both JavaScript and Python support try-catch style of exception handling, or no exception when error happened; and in Go need to explicitly check the returned value which is err, or when the err returned is nil, means no error.

Iteration of the parsed JSON object

To to complete for this writeup, we show an example here how to loop over the key value pairs from each language: it doesn’t differ a lot in these 3 languages, it’s just a for loop of the JavaScript object / Python dict / or a Golang map:

// notice this JavaScript ES6 syntax, Object.entries is from ES2017
for (const [key, value] of Object.entries(obj)) {
// use key and value
}

in Python:

for key, value in obj.items():
# use key, value ...

and Go:

for key, value := range obj {
// use key and value ...
}

To encode an obj back to string data

The above section is only to decode it; however when we save the structured data to a file on disk, or send over network, we have to serialize it, or say: encoding the JSON value to a string representation, let’s compare the 3 languages as well:

first is in JavaScript, use the globally available JSON object again:

var data = JSON.stringify(obj);
// the data is a string of this:
// '{"country":"United States","countryCode":"US","region":"CA","regionName":"California","city":"Mountain View","zip":"94043","lat":37.4192,"lon":-122.0574,"timezone":"America/Los_Angeles","isp":"Google Cloud","org":"Google Cloud","as":"AS15169 Google Inc.","mobile":true,"proxy":false,"query":"35.192.xx.xxx"}'

Then in Python, this also need to import json first:

import jsondata = json.dumps(obj)
# or this below if you want the string representation to be really compact without any spaces
data = json.dumps(obj, separators=(',',':'))

Next is in Go language:

import "encoding/json"data, err := json.Marshal(&obj)
// optionally check err

Notice if run this code, you may see that Python’s default dumps(stringify) function has a problem of default string isn’t very compact, it included a lot of spaces, need to pass in extra separators parameter, (thankfully, in Python3.4’s json library got finally fix that).

The compact string representation is only good to send over network to minimize bytes transmitted, or let browser side has minimum length of data to retrieve; but however isn’t very human friendly, you may have heard to indented JSON string representation format, how do we do that in each programming language?

in JavaScript:

var data = JSON.stringify(obj, null, 2);// if use console.log(data) to print the data, it is like this:
{
"country": "United States",
"countryCode": "US",
"region": "CA",
"regionName": "California",
"city": "Mountain View",
"zip": "94043",
"lat": 37.4192,
"lon": -122.0574,
"timezone": "America/Los_Angeles",
"isp": "Google Cloud",
"org": "Google Cloud",
"as": "AS15169 Google Inc.",
"mobile": true,
"proxy": false,
"query": "35.192.xx.xxx"
}

in Python:

import json
data = json.dumps(obj, separators=(',', ': '), indent=2)

Someone may omit separators in the indented case here, if you really do so, and check the data returned from without separators set, it looks ok it indeed has newline and each key-value pairs are on its own indented line, but if check the string, there is line-trailing spaces, although invisible, but it wastes space if write to disk, or either waste network bandwidth. So keep in mind the Python’s default json API is kind of awkward.
There is an Update in Python3.4 changed separators’ default value to the most wanted (‘,’, ‘: ‘) when indent is provided; But in true world, Python2.7 is still pretty common, so it’s worth mention here.

And in Go:

import "encoding/json"data, err := json.Marshal(&obj)// or use the indent version for pretty JSON output
// with two extra parameters for prefix string "", and indent string " " we use two spaces here
data, err := json.MarshalIndent(&obj, "", " ")

The keys order problem of a JSON object

In above encoding example I’ve shown only the output from JavaScript code, there is a reason for that: if you actually run JSON handling in other programming languages other than JavaScript, you will realize the problem that encoded string isn’t exactly same as the original parsed ones! the keys order changed very arbitrarily! like in Python might be like this: the keys order do not match original string at all; run Go code is the similar

In [3]: print json.dumps(obj, indent=2, separators=(',', ': '))
{
"city": "Mountain View",
"countryCode": "US",
"zip": "94043",
"mobile": true,
"country": "United States",
"region": "CA",
"isp": "Google Cloud",
"lon": -122.0574,
"as": "AS15169 Google Inc.",
"query": "35.192.xx.xxx",
"proxy": false,
"org": "Google Cloud",
"lat": 37.4192,
"timezone": "America/Los_Angeles",
"regionName": "California"
}

While, is it a real problem? If read from the JSON spec https://www.json.org/ it’s not a real problem because JSON spec doesn’t require to keep the order, in other words, it’s up to each language/library implementation.

In practice when I was programming in Python handling JSON, it’s kind of annoying to me, because many reasons 1) although JSON is designed mainly for data exchanging but however some software is already using it as human readable interface, it’s annoying if keys order are changing randomly every time 2) some legacy software is already designed a wrong behavior of relying on keys order, like in calling MongoDB API when sending the JSON over wire, some semantics would change if different keys of a query appears in different order, Microsoft also has a service requiring a special key _type always appear the first; 3) for making tools like the JQ command-line JSON processor, one of the important things for a tool to manipulate JSON is to reserve the JSON keys order, otherwise in tools like HTTPie — aitch-tee-tee-pie — is a command line HTTP client: an advanced version of curl, I had been using it for a longer while, until I was hit by this problem https://github.com/jakubroztocil/httpie/issues/427 because of Python’s json dumps not keeping order problem, and the HTTPie developers seem have no intention to fix it, I have dropped using it, and someday somebody may want to make such a tool in Go, one of the crucial feature I see is it has to keep JSON keys order, so far I am using curl pipe to JQ, the JQ is really a tool fulfilling such requirement, and written in C; I need all programming languages to have this ability, (or most of the programming languages including all three here which I care about).

If our code need to interact to MongoDB or Microsoft or some proprietary systems which has incorrect behavior of relying on JSON object keys order, But our programming languages’ JSON object stringify cannot guarantee the order, what can we do? Golang forum has a thread discussion before, people were arguing that’s invalid case, need to fix in the origin of the wrong behavior, which is of course true, but to our past record, is it something easy to push MongoDB or push Microsoft to change their wrong behavior? what makes it worse is that legacy software might be from some already dead company and our project might have a deadline?

From pure computer science, can we implement that behavior of preserving JSON object keys order? I think it is yes, let’s research more! in each of the favorite language

first, is in JavaScript: it’s already the default behavior of preserving keys order! at least in NodeJS and Chrome browser I tested:

JSON.stringify(obj);

in Python it’s a little bit harder, but it’s more of use of another data structure which is OrderedDict needs import from collections, this was first added since Python2.7:

import json, collectionsobj = json.loads(data, object_pairs_hook=collections.OrderedDict)
# obj is now:
OrderedDict([(u'country', u'United States'),
(u'countryCode', u'US'),
(u'region', u'CA'),
(u'regionName', u'California'),
(u'city', u'Mountain View'),
(u'zip', u'94043'),
(u'lat', 37.4192),
(u'lon', -122.0574),
(u'timezone', u'America/Los_Angeles'),
(u'isp', u'Google Cloud'),
(u'org', u'Google Cloud'),
(u'as', u'AS15169 Google Inc.'),
(u'mobile', True),
(u'proxy', False),
(u'query', u'35.192.xx.xxx')])

in Go

for the standard pkg encoding/json it didn’t even mention the keys order problem, can’t support it once the object is loaded into in-memory map, it becomes an unpredictable order! so is it no way?

Fortunately since Golang1.6 the designers of Go builtin library has exposed the Decoder type, for handling JSON at token level, this was necessary for some other performance reasons like to handle very large array efficiently, it is just by the way exposed the possibility of handling JSON object key-value pairs sequentially

Here I made a small library to do so

Just to mention a few highlights of this library:

  1. maintained some easy to use API, to handle the keys order efficiently, also provides iteration function to walk through the object, for some very large object in data engineering, this is efficient, and convenient; read doc at https://godoc.org/gitlab.com/c0b/go-ordered-json
  2. since somebody has tried to push similar code logic to Go’s standard library, however it was abandoned (for non-sense reasons I think, but for this reason, please don’t ask and I won’t try to push to standard library)
    https://go-review.googlesource.com/c/go/+/7930
    Abandoned 7930: encoding/json: Optionally preserve the key order of JSON objects

Conclusion: JSON is the most universal data exchange format, its support in library is universally available among almost all 100+ existing programming languages, however the true world is imperfect, and different programming language are viewing the imperfect part differently, to insist on purity of specification, or provide convenience features to developers, is a big difference to different PL language designers

If you like this writing, please consider give me claps; that could motivate me writing more (on medium) and sharing more (I like gitlab); Thanks!

--

--

Tomas • Y • C

Open Source Evangelist | Early Git advocator since 2007 | Node.js user since ES2015