JSON: When SQL met NoSQL

Chinmay Bag
Jun 26, 2018 · 8 min read

Relational databases have, for the longest time, evolved into and ruled the realm of database management systems due to their dexterity in allowing users to store data in all formats and all use cases imaginable. However, in the past decade or so, many other models, previously thought of as inefficient and unusable, are again making strides into this realm previously dominated by relational databases. In keeping with their legacy of adapting to the needs of the industry, the popular relational databases have started embracing certain features from the document model. The most ubiquitous of these features is the JSON data type.

Postgres has had JSON support for a while, but it wasn’t that great due to a lack of indexing and key extractor methods. With the release of version 9.2, the Postgres team added native JSON support and operators for transforming data to JSON from other data formats.

MySQL introduced the native JSON support with version 5.7.8 citing automatic JSON validation and Optimized storage support as incentives to store JSON-format strings in JSON columns, instead of a string column.

Why would I want to store a JSON in an RDBMS?

Despite of all the convenience that the popular document-model databases provide, my go-to database management system has been MySQL for the longest time. Recently, Postgres has taken over as my preferred choice for a SQL database due to the absolute smorgasbord of features it provides that probably others don’t. The relational model comes with a set of principles and ideas perfected and optimized over the half a century that they’ve been around. Apart from certain exceptional use cases, where other database management systems shine, these relational concepts lead to an intuitive and therefore simpler way of representing our data representation and storage needs.

Initially, these database systems only supported the primitive data types. Recently they’ve been ramping up their game and going head to head with their newest competitors. As a result, SQL databases have borrowed the JSON data type from the document model databases.

There is some merit in storing your data as JSON in your favourite relational database. Some of the use cases that fit the bill are as follows.

Ubiquitous format for storing API requests and responses

A lot of third-party APIs support JSON in their request and response formats. Storing these request payloads and API responses as JSON makes more sense than storing them as Text since now, with JSON operations enabled, one can query data based on certain simple and nested properties in the JSON.

Similarly, JSON has merits in a distributed system, like the microservice architecture. Data passing between individual services can be done using JSON encoding. It is the de facto encoding inside a Node.js ecosystem. However, in polyglot systems, the efficient way for data passing would be to use something like Apache Avro or Protocol buffers.

The same can also be said for asynchronous modes of communication. JSON being just an encoding format, is also applicable for representing the payload in message queues or task queues. E.g: AWS’s SQS/SNS services, among others, support JSON as one of their payload formats. Thus, JSON is well-supported in both asynchronous as well as synchronous communication.

Simpler database designs

Storing data as JSON can also aid in simpler schema designs. Sometimes, we don’t know beforehand, in addition to the mandatory basic attributes, what optional attributes we might need to store regarding our data entities. In such scenarios, Entity-attribute-value (EAV) tables are used. EAV tables often result in cumbersome queries, especially in case of complex joins. These EAV tables can easily be replaced by JSON, which can be queried, indexed and joined conveniently and even effect marked performance improvements, as demonstrated in the following images.

Relation user_vitals represented using an Entity-Attribute-Value Table
Relation user_vitals represented using JSON

Client side configuration for an individual user

Javascript is rightly hailed as the Lingua Franca of The Web, making JSON as the Encoding of the Web. Local storage and other Web APIs in the browsers allow storage and retrieval of data using a JSON-like dot operation.

User configuration data for the client application can be stored as a JSON instead of etching out a separate column for each new column, such as a link or number of followers for yet another social media platform. For certain applications, one may be required to store some sort of information regarding how the application may be rendered and how the components in the UI may be positioned.

The data being transferred between the client application and the back end is nowadays encoded as a nested JSON structure. The notoriously abstruse XML format is losing ground to JSON as the latter is closer to the data models used by client side application code.

Just like all spreadsheet and data analysis applications, regardless of the platform must support the CSV format, all server-side languages, either via their standard library or via a popular third-party library, support efficient operations on data serialization in JSON; e.g. Python has Pickle and Java has Java.IO.Serializable.

This reduces the data transformation needed to pass front-end configuration or the ever expanding social media information for the user.

Data migration and backward/forward compatibility

During active development of an application, data format and schema changes may happen too frequently to be comfortable. Some of these may include breaking changes. You may want to execute a rolling upgrade where changes are introduced to different versions at different points of time. This necessitates multiple versions of the server-side application to exist simultaneously, accessing different data formats.

In order for the system to continue running smoothly, compatibility needs to be maintained in both directions. Backward compatibility can be easier to achieve as the newer application code can ignore the old data format or newer code can be written to explicitly handle the discrepancies. However, forward compatibility can be trickier as older application code needs to handle the new data format.

However, schema evolution becomes much more manageable using JSON. Suppose you are building the next hottest music library on the streets and want to track all your favorite songs from the major music streaming platforms. So, you make tables for all the basic entities in your system - user accounts, albums, artists, songs and playlists.

-- Table Definition ----------------------------------------------

CREATE TABLE song (
id SERIAL PRIMARY KEY,
name text NOT NULL,
artist_id integer REFERENCES artist(id),
duration time without time zone DEFAULT '00:00:00'::time without time zone,
album__id integer REFERENCES album(id)
);

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX song_pkey ON song(id int4_ops);

After a while, you decide you want to let your users store the Soundcloud links to the tracks they tag to their accounts. After some discussion, your project manager decides he wants you to allow users to save Spotify and Mixcloud links as well. Three weeks down the line, a new requirement is thrown in the mix that users must also be allowed to store their favourite Youtube music video link for the song. An amateur database admin may create a new table with a new column for storing links for each of the music streaming service. A good database admin, in an attempt to attain some normalized form, may suggest storing these links in a EAV table where you store such attributes and their values (the links) in a many-to-one relationship between the links table and the song table.

-- Table Definition ----------------------------------------------

CREATE TABLE links (
link_id SERIAL PRIMARY KEY,
song_id integer REFERENCES song(id),
platform character varying(50),
link text
);

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX links_pkey ON links(link_id int4_ops);

These many-to-one relationships are known to cause troubles while querying if not done carefully, such as multiple rows being returned for a single row during joins. I have fallen into this pit way more times than I should have. Since these links do not have any bearing on any other relationship apart from with the table song, we can simply store these links in the song table by just adding a column called links with the data type JSON.

ALTER TABLE "public"."song" ADD COLUMN "links" JSON DEFAULT '{}';

Now, we can store links as key value pairs against the songs in the same row. Even, if your project managers comes with a requirement to store links for yet another music streaming platform, you can just club it in with the rest of the entries. In fact, these can also be queried to return application-code-friendly results, in whole or selecting only certain keys in the JSON, by using the JSON operators available in your choice of SQL database management tool. We will explore this aspect in a later post.

But wait…

Once you decide to store data as JSON in your preferred flavor of SQL, you are bound to run into the following issues sooner or later.

Missing compatible data types in the JSON spec

The JSON spec doesn’t allow you to store values of type Date or Function. You need to use libraries such as json-fn in Javascript on read and write. Such libraries use their own representations to serialize the JSON object to a string and later deserialize the original contents back.

A common workaround for storing dates in JSON is to store the datetime as a string. The JSON.stringify() function in Javascript does this by default. Another possible solution is to store timestamps in the form of a UNIX timestamp, which is basically the number of milliseconds that have elapsed since the UNIX Epoch. In Javascript, you can get the current UNIX timestamp by calling the Date.now() function. The UNIX Timestamp for an already defined datetime value can be derived by calling the Date.prototype.getTime() prototype function on the assigned date variable.

Another issue with the JSON representation of data is that it cannot accurately hold Long Integers greater than 253. As a result, Twitter’s APIs return Tweet Ids twice in the response, once as the original Long Integer form (id) and once as a String (id_str), as can be seen in the following image, borrowed from a blog called What When How.

Twitter’s API response for a tweet

Bulky storage

Storing a large number of huge JSON objects can also be a problem as it is known to take up a lot of space. To tackle this issues, many database vendors on both sides of the fence have come up with their own versions of Binary encodings of JSON (or the document).

Postgres has JSONB and MongoDB has Binary JSON (BSON). Here’s a Stack Overflow link discussing all the differences between the JSON and JSONB datatypes in Postgres.

Missing default schema validation

Since the JSON spec itself does not store schema information, it does not enforce verification of data structure being stored as JSON, by default. It is possible to stored values against misspelled keys or storing invalid or unused keys in these JSON. These issues can be resolved by maintaining a Schema-on-write validation before storing data as JSON in your database. With regards to this, the Postgres documentation has the following to say:

Having a predictable structure makes it easier to write queries that usefully summarize a set of "documents" (datums) in a table.

In the interest of respecting our readers’ attention span and preventing this post from becoming too lengthy, we will discuss the various JSON (and JSONB) operations now possible in SQL in a later post.


Praemineo

Praemineo

Thanks to Rohan Gada

Chinmay Bag

Written by

Software Engineer at @praemineo.

Praemineo

Praemineo

Praemineo

More From Medium

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade