BSON, JSONB, OSON…
A deep dive into Binary JSON formats

Loïc Lefèvre
db-one
Published in
4 min readSep 16, 2024

If you’ve used JSON to store and manage data, you’ve likely faced situations where the format seemed to fall short of your expectations. What if some of these disappointments weren’t random but were actually inherent to the design of the format you’re using? Imagine if subtle changes in your development practices or a deeper understanding of the underlying binary JSON format could resolve these issues.

Let’s embark on a journey where we’ll explore BSON, JSONB, and OSON, uncovering how each format addresses common challenges and how you can leverage their unique features to enhance your data management practices — think of it as entering the Matrix of JSON data handling!

Chapter 1 — History

Here is a brief list of important dates that relate to storing hierarchical data (e.g. Tree structure):

  • 1960s–1970s, also known as the “Pre-Relational Era”: data were mostly hierarchical (remember IBM IMS…); also E. F. Codd let us an incredible legacy
  • 1980s–1990s also known as the Relational Era: RDBMSs become popular, the relational model to solve “Hierarchical data problems”
  • December 1999: JavaScript standard, includes JSON premise
  • February 2009: first release of MongoDB with BSON (binary)
  • September 2012: PostgreSQL 9.2 released with JSON (text format)
  • October 2013: ECMA-404 standard (1st edition)
  • August 2014: Oracle 12.1.0.2 released with JSON (text)
  • December 2014: PostgreSQL 9.4 released with JSONB (binary)
  • August 2015: MySQL 5.7.8 released with JSON (binary)
  • February 2017: Extended JSON v1 (MongoDB)
  • March 2017: Oracle 12.2.0.1 released with OSON (binary) for Columnar Processing/SIMD CPU instructions
  • July 2017: Extended JSON v2 (MongoDB)
  • December 2017: ECMA-404 standard (2nd edition)
  • August 2020: Oracle Autonomous JSON Database, a managed cloud service that includes OSON (binary) in BLOB columns
  • August 2021: Oracle 21c released with JSON data type (OSON binary)
  • February 2022: Oracle Database API for MongoDB available for Autonomous JSON Database
  • June 2022: MongoDB Atlas released
  • April 2023: Oracle 23c Free Developer release contains the JSON Relational Duality Views feature…
  • January 2024: SQLite 3.45 now supports JSONB (binary)
  • May 2024: Oracle 23ai Generally Available (Cloud, FREE, on-premises with Engineered Systems)

What’s interesting is that Binary JSON has been added to several major databases during the last 15 years. MongoDB, with its BSON format, was the precursor. Then followed PostgreSQL, MySQL, Oracle, and, more recently, SQLite.

But what is a Binary JSON format? what is its purpose? and why are there so many distinct formats?

Chapter 2 — What is a Binary JSON format?

The JSON (JavaScript Object Notation) standard is generally well understood. Basically, a JSON document is written using UTF-8 encoded text. It can have fields with values. Field names (aka keys) should not be duplicated inside a JSON document but when written as UTF-8 text, we Humans, can duplicate field names :) In the very same spirit, the standard doesn’t say anything about field name ordering (more on that later). Finally, we can say that the JSON standard supports 4 distinct datatypes for the scalar values (e.g. not speaking about objects nor arrays):

  • null
  • booleans
  • numbers
  • strings

And that’s pretty much all to know.

One of the problems when dealing with JSON data in databases is that the JSON documents were persisted as text, and thus, they had to be parsed over and over when they were accessed, say for filtering or during the projection of a subset of the fields…

Parsing can consume a lot of CPU, but it can also require a lot of RAM! Think about the field names; usually, they are just metadata but when loading numerous JSON documents, they can represent 20% to 45% of data overhead: they don’t represent meaningful data at all, but they are required and repeated in RAM.

Field names (in purple) consumes 46.2% of the document above. One billion of such documents would require 85 GiB solely for storing the field names as text!

Binary JSON formats have been invented to pre-parse the UTF-8 text JSON documents and, by doing so, to improve database and application performance. Pre-parsing also means that the JSON documents are first validated hence when documents are persisted with a Binary format, it also means these documents have been validated, hence data quality is also improved!

But why are we mainly discussing BSON, JSONB, and OSON? Because the formats are really different.

In the next chapters, we’ll focus on each of these formats in detail, revealing their strengths and sometimes their limitations…

--

--