How to Query the Full History of Your JSON Database

Johannes Lichtenberger
Oct 7 · 6 min read
Image for post
Image for post
Image for post
Image for post
Evolution of states through modifications and commits in SirixDB

Contribute to the evolutionary, accumulate-only database system called SirixDB and to the Svelte based Frontend

Introduction

First and foremost, you probably don’t need a database system to handle small JSON files, ranging to a few megabytes.

However, if you have to manage and query GBs of data, you should use a database system.

Usually, database systems, however, are not designed to keep the full history of your data. Often, the system overwrites data during a change or keeps the data for a short time. The latter usually happens due to transactions, which currently read slightly outdated data. Thus, a garbage collector has to wait until all reading transactions finish. Then, it’s able to delete the old data.

Instead, SirixDB makes a huge persistent tree, durable during commits. It only ever appends data. Every revision is indexed, whereas the revisions share unchanged page-fragments. Think of it like Git, but on a sub-file level. Persistent trees are also common in functional languages like Haskell and Closure. A transaction commit, which serializes pages in a postorder traversal, is depicted in the following image:

Usually, mapping these persistent in-memory structures to a durable device adds write amplification, as the full leaf-to-root path has to be adapted. We, however, use a keyed trie for our main document store. Thus no structural changes occur as in B-trees. Furthermore, we developed a new sliding snapshot algorithm to version the data pages and store variable-sized page fragments. Therefore only a few changed records are written to a new page-fragment. Reconstructing an in-memory page involves reading a sliding window of page-fragments from random locations in parallel.

Due to the requirement for fast, random, fine granular reads, modern hardware as Byte Addressable NVM will be essential to good performance.

SirixDB stores the JSON data in a binary format, whereas each transaction is bound to a specific revision. Currently, N read-only transactions on a resource can coexist with one read-write transaction on a resource (represents the JSON data).

The encoding is as follows, except that we lately also introduced a pointer from the parent to the last child node.

Image for post
Image for post

Query Your Data

The system uses and extends an XQuery 3.0 processor to process both XML and JSON data.

We’re able to import JSON files from a specific directory via:

It’ll create a database named mycol.jn and several resources.

We can also store a bunch of JSON-strings within several resources in the database (mycol.jn):

Note, that by when JSON-strings we have to replace true and false with true() and false() as well as null with jn:null().

We can then query the database collection via:

Or, if we want to open a specific resource in the database, we can use:

Furthermore, we can update the resource. Let’s say we have the following small JSON file:

We can insert a JSON object with:

We dereference object field names with the => operator. It even can drill down into arrays and find the value for the field name.

The update-operation will create a new revision with the field name and the added value. For instance

will select the first object in the array value of the field name tada.

We implicitly query revision 2. If we instead query revision 1, we’ll still retrieve the old JSON object.

Note that we specified one as the third argument of the function. The output is:

Instead of providing a numeric value to the third parameter we can also open a specific revision by a timestamp (here: to check how a specifc resource looked like in April 2018):

With the function open-revisions we’re able to load all revisions of a resource between two points in time:

The following query will insert an object into the array value dereferenced by =>foo as the second item:

Using

we get the output:

Likewise, we can replace a JSON value with

or delete a value:

If you submit the queries through the REST-API, you first have to be authorized. Thus, a revision, as in Git, also stores the author’s name and a UUID.

To get the author name, who committed a specific revision:

To get the UUID:

You can also project fields:

This will output:

Time travel functions

Besides, we can use a bunch of time travel queries. The following functions retrieve different versions of a JSON item:

Function for selecting the json-item in the future or the future-or-self. The first parameter is the context item. The second parameter denotes if the current item should be included in the result or not.

Function for selecting the json-item in the past or the past-or-self. The first parameter is the context item. Second parameter denotes if the current item should be included in the result or not.

Function for selecting the json-item in all revisions.

Function for selecting the json-item in the first revision.

Function for selecting the json-item in the last / most-recent revision.

Function for selecting the json-item in the previous revision.

Function for selecting the json-item in the next revision.

To get an item in all revisions, in which it has been changed (or inserted):

Diffing

We’re of course also able to retrieve diffs:

This function compares revision one and three of a resource called myresource2.

The output format is a JSON-string:

For instance, this format is also used by our , which we currently develop based on Svelte.

It denotes that SirixDB compared revisions 1 and 3. Furthermore, the different diff types are insert, delete, update and replace (the latter not shown). The insertPositionNodeKey is the context node, where an insert occurs, the insertPosition denotes if it's inserted as a first child, as the right sibling, or the left sibling.

We can also diff a subtree and specify two additional parameters. The root node, which is going to be compared, and the depth:

This compares also revision 1 and 3 of the myresource2 resource in the mydocs.col database. However, this time, the diff starts at the node, denoted by its unique nodeKey 7453. Furthermore the diffing should skip descendants, which are deeper than 2 levels.

In order to get the nodeKey of a specific item, we can use:

Conclusion

SirixDB offers powerful ways to query the full history of your data.

We omitted the creation of index structures, but you can also create secondary index structures with XQuery, which are automatically versioned as well.

Consider the following document (serialized SirixDB resource with two revisions):

We can create a Content-And-Structure (CAS) index as follows:

In order to answer the query

Furthermore, we didn’t mention FLOWR expressions, which are the cornerstone of XQuery. A simple example with an implicit join:

JavaScript In Plain English

New JavaScript + Web Development articles every day.

Johannes Lichtenberger

Written by

I'm working on a persistent, durable document store called SirixDB written in Java in my spare time. It's especially well suited for novel byte-addressable NVM.

JavaScript In Plain English

New JavaScript + Web Development articles every day.

Johannes Lichtenberger

Written by

I'm working on a persistent, durable document store called SirixDB written in Java in my spare time. It's especially well suited for novel byte-addressable NVM.

JavaScript In Plain English

New JavaScript + Web Development articles every day.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store