The pros and cons of different data formats: key-values vs tuples
How data is formatted under the hood
Working on Vasern (a client database for React Native) has given me an opportunity to try and test different data formats which include key-value, column-oriented, document, and tuples. Each format was designed to suit different scenarios.
The criteria of these tests focus on performance, the ability to lookup values, and space efficiency. Besides, it is not required to have on-disk sorted keys and indices. They will be loaded into memory for fast lookup.
In this post, I will recap the pros and cons of the two common formats: key-values and tuples format. Also, I’ll introduce tagged key-values, an extension of key-values with index lookup, which benefits from the tuples format.
Key-Value Store
Key-values store a collection of key-and-value pairs, where sometimes the value represents more than one value, separated by delimiters (i.e. a comma). Those pairs are organized into blocks with fixed-length (for fast traverse between records).
Advantages of the key-value store:
- Simple data format makes write and read operations fast
- Value can be anything, including JSON, flexible schemas
Disadvantages:
- Optimized only for data with single key and value. A parser is required to store multiple values.
- Not optimized for lookup. Lookup requires scanning the whole collection or creating separate index values
Tuples Data Store (RDBMS)
The tuples data format has existed for many decades. It is used in relational databases such as MySQL, Postgres, etc.
Unlike the key-values format, it relies on the predefined schema to organize records into rows, and its values in fixed-length columns. Each value only/usually represents a single piece of information.
Advantages of tuples data store:
- Structured data format helps traverse through values of records quickly
- Optimized for lookup (common use of SQL for querying records)
Disadvantages:
- Constrained by schema structure
- Change of schema usually requires rewriting the whole database
Tagged Key-Value Store
Tagged Key-Value is an extended version of Key-Value storage — it has more than one key for a single value. In other words, it has a key, indexes (or tags) and a body value for each record. Where:
- Key and Indexes will be loaded into memory on startup
- Body value can be anything from a plain string, BSON/JSON, or comma-separated value.
Advantages of Tagged Key-Value store:
- Semi-structured, which helps traverse through records and indexes fast
- Optimized for lookup (through keys and indexes)
- A record body can be anything, ideal for flexible schemas
- Space efficiency (key, indices are organized in tight columns)
Disadvantages:
- Change of schema that includes indices might need data migration
Vasern with Tagged Key-Value Store
Vasern is a client database for React Native. The latest version was released under beta for testing and was using key-value storage.
In the upcoming 0.3.0-RC version, Vasern is switching to a tagged key-value store layout. Focus is on its powerful lookup feature and space efficiency.
Below is a demo query. It’s beautiful, isn’t it?
Conclusion
There are many databases with different data formats to choose for an application. Two common formats are:
- Key-Value pairs — fast read and write but not optimized for lookup. It’s often used as simple data storage, NoSQL.
- Tuples — support multi typed-values, indexes, optimized for lookup, but a lack of schema flexibility. Commonly used for Relational Databases.
By combining the strengths mentioned above, the Tagged-Key-Values format is flexible with data schema, and is able to look up records through keys and indices. This is often better suited for a client’s database.
If you found this article useful, please click on the 👏 button a few times to make others find the article and show your support! 👊
Thanks for reading!