MySQL JSON columns and virtual indexes

Mayank Agrawal
Tata 1mg Technology
9 min readMar 5, 2021

Problem - Here at 1mg, we use MySQL where we came across the task to store patient symptoms in JSON for easy access and retrieval. We needed to store multiple symptoms already present in the database along with comments in case of other symptoms.
The major issue was storing this without creating another column for comments which will in this case have remained null for most of the entries.
To solve this problem we found a unique method called virtual columns which, can extract data from the JSON values and can be used to index them.

The solution can be visualized below.

Summarized process

The base for the solution:

  1. Upgrade the MySQL from our current version 5.6 to minimal version 5.7.8 that supports JSON.
  2. Migrate the data from the old MySQL to the new one.
  3. See nothing breaks in the process. (Rails gem update in our case)

EASY PEASY?

The 1mg DevOps team handled the data migration and up-gradation of the MySQL database on AWS pretty well. While the database up-gradation was going, we parallelly started implementing JSON datatype in the development environment. We found a few important pieces of information about JSON handling and storage limits which are described below.

The references to what I will be writing here can be found in the MySQL bible.

Part 1 - MySQL JSON support

MySQL started supporting the native JSON data type defined by RFC 7159 that enables efficient access to data in JSON from version 5.7.8. So if you have an upgraded version of MySQL you can refer to this guide to do so.

Earlier MySQL provided support for storing JSON encoded stings but having JSON support in the database gives a few advantages over storing JSON encoded strings.

  • Automatic Validation of the JSON data.
  • Optimized Storage.

The space required to store the MySQL JSON type is the same as LONGTEXT or LONGBLOB type. This in fact depends upon the max_allowed_packet system variable. Which defaults to the 4MB and can be extended up to 1GB.

So many of you will know what JSON data types are. But here is a small recap of this.

JSON stands for (JavaScript Object Notation)

It was developed as a lightweight format for storing and transporting data.

There are two types of JSON values:-

  1. JSON array:- A JSON array contains a list of values separated by commas and enclosed within [ and ] characters.
    Eg. [“abc”, 10, null, true, false]
  2. JSON Object:- A JSON object contains a set of key-value pairs separated by commas and enclosed within { and } characters.
    Eg. {“k1”: “value”, “k2”: 10}

Let's start with instructions to implement JSON columns in MySQL.

  1. First, start MySQL 5.7.8 or above. (this is a crucial step don’t skip this).
  2. Create a new DB to test this. (If you want to ruin your current DB skip this step.)
  3. Create a new table by writing the following commands.
CREATE TABLE table1 (jdoc JSON);

4. Insert values in the table

INSERT INTO table1 VALUES('{"key1": "value1", "key2": "value2"}');

Here is an explanation of what happened above.

We are inserting values in the above-created (step3) of the table i.e. “table1”. “table 1” has a column “jdoc” of type JSON. In step 4 we inserted a JSON object {“key1”: “value1”, “key2”: “value2”} in the table. A question arises why we have included the JSON object in single quotes(‘’)? Well, this is due to the reason that MySQL only supports inserting strings. MySQL matches the syntax of the string with the data type of the field and parses JSON from it accordingly.

6. Check that the values inserted from above are in the correct format.

SELECT * from table1;

This will return the data in the following format

+--------------------------------------+
| jdoc |
+--------------------------------------+
| {"key1": "value1", "key2": "value2"} |
+--------------------------------------+

This looks like it's working.

KUDOS! we have finished our first step in implementing JSON columns in the MySQL database table.

Part 2 - Extracting values from fields.

Now came the second part extracting the values which we have added in the above exercise.

JSON_EXTRACT method is used to extract the data from the JSON fields.

As per the official documents of MySQL method is declared as follows

JSON_EXTRACT(json_doc, path[, path] ...)

This method returns data from the JSON document, selected from the parts of the document matched by the path arguments. Returns NULL if any argument is NULL or no paths locate a value in the document. An error occurs if the json_doc argument is not a valid JSON document or any path argument is not a valid path expression.

The return value consists of all values matched by the path arguments. If it is possible that those arguments could return multiple values, the matched values are autowrapped as an array, in the order corresponding to the paths that produced them. Otherwise, the return value is the single matched value.

MySQL supports the -> operator as shorthand for this function as used with 2 arguments where the left-hand side is a JSON column identifier (not an expression) and the right-hand side is the JSON path to be matched within the column.

Let’s start with instructions on how to extract MySQL data from the JSON.

  1. We will be using the same table as above for this.
  2. Clear the table using the following command
DELETE from table1;

3. Add the following values to the table using the given command

4. This will give the following table schema

+------------------------------+
| jdoc |
+------------------------------+
| {"id": 1, "name": "Rocco"} |
| {"id": 2, "name": "Matilda"} |
| {"id": 3, "name": "Flacko"} |
+------------------------------+

5. Now if you want to perform operations on the table you can extract and order data from the following commands

6. This will return the following values from the table.

+------------------------------+--------------+
| jdoc | jdoc->"$.id" |
+------------------------------+--------------+
| {"id": 3, "name": "Flacko"} | 3 |
| {"id": 2, "name": "Matilda"} | 2 |
+------------------------------+--------------+

7. Check the explanation given below.

Here is an explanation.

The command given in point 5 gives a crude idea of how to extract data from the rows and order them. Here we are finding all the rows with id > 1. By first extracting id from the JSON and comparing it with the greater than condition and then ordering the data extracted from the table on basis of name.

Part 3 - Indexing the JSON values

This one is a tricky part. MySQL doesn’t support indexing the JSON values. To support this, developers have found a turnaround by using the MySQL feature of virtual columns. You may be thinking about what are virtual columns here is the explanation.

What are virtual columns?

MySQL 5.7 introduced a new feature called virtual/generated column. It is called generated column because the data of this column is computed based on a predefined expression or from other columns.

Assume you want to retrieve the full name of the person from the following table usersdirectly. What will you do?

+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| first_name | varchar(255) | YES | | NULL | |
| last_name | varchar(255) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+

You can do this by joining the first_name and last_name with a space following SQL query in the following query

This will give you the following result

+------------+-----------+-----------+
| first_name | last_name | full_name |
+------------+-----------+-----------+
| sam | sung | sam sung |
+------------+-----------+-----------+

Assume if you want to do this process thousands of times and always need the data you can leverage the virtual columns.

This can be done by generating the virtual columns using the following command.

This command will change the table and add a virtual/generated column to the users table.

What happened in the above command?

Here we are adding a generated column/virtual column by using the command alter table and specifying that the newly added column is of type varchar formed using the CONCATing the first_name and last_name.

Similarly, this method can be used to get the discount prices directly from the payments if the user has the original prices and the offered price by subtracting those values and generating the discount column to have MySQL automatically do those operations for you.

By this time most of you may have figured out what we will be doing to provide indexing the JSON entries. We will be generating virtual columns from the JSON datatypes and index them!

How? It's a good question!

Indexing JSON columns

What we will be doing is extracting the data from JSON fields and create a generated column from it and index the table on it.

So we will use the table1 for this and try to index it.

We will use the following command to generate the new column;

ALTER TABLE table1 ADD COLUMN id BIGINT GENERATED ALWAYS AS (jdoc->"$.id");

What we did in this command is alter the table1 and added a column generated from the id key present from the jdoc column of the table.

Now selecting the values from the table we will get the following

+------------------------------+------+
| jdoc | id |
+------------------------------+------+
| {"id": 1, "name": "Rocco"} | 1 |
| {"id": 2, "name": "Matilda"} | 2 |
| {"id": 3, "name": "Flacko"} | 3 |
+------------------------------+------+

Now we will be indexing the id column.

CREATE UNIQUE INDEX TABLE1_ID_INDEX ON table1 (id);

Voila!!!! we have our index on JSON;

To check the current indexes on the table use the following command.

SHOW INDEXES FROM table1 \G  #\G returns data formatted in expanded mode.

This will show the result in the following way

Table: table1
Non_unique: 0
Key_name: TABLE1_ID_INDEX
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 3
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:

Showing you have the correct index.

This concludes our tutorial on how to add an index on JSON values.

Where this can be helpful?

  • When storing email addresses and phone numbers for a contact, where storing them as values in a JSON array is much easier to manage than multiple separate tables
  • Saving arbitrary key/value user preferences
  • Storing configuration data that has no defined schema

Why it’s not viable?

  • Complex queries
  • If you really want to be able to add as many fields as you want with no limitation (other than an arbitrary document size limit), consider a NoSQL solution such as MongoDB.

Problems with applying indexes on JSON values

Like everything in life, there is no free lunch. This thing also has some cost/problems stated below.

  • Only secondary indexes can be created on the virtual generated columns. These are referenced as virtual indexes.
  • When a secondary index is created on a virtual generated column, generated column values are materialized in the records of the index
  • If the index is a covering index (one that includes all the columns retrieved by a query), generated column values are retrieved from materialized values in the index structure instead of computed “on the fly”.
  • There are additional write costs to consider when using a secondary index on a virtual column due to computation performed when materializing virtual column values in secondary index records during INSERT and UPDATE operations.
  • Even with additional write costs, secondary indexes on virtual columns may be preferable to, generated stored columns, which are materialized in the clustered index, resulting in larger tables that require more disk space and memory.

Conclusion

So, in conclusion, this is a pretty good technique to use JSON data in MySQL if you don’t want to migrate to other NoSQL DBS and Postgres which provide pretty good support for the JSON values. Generated columns are also a great feature to implement automatic generation of columns for easy calculations and small calculations. These columns as result provide a good way to add an index to them.

--

--

Mayank Agrawal
Tata 1mg Technology

Ruby On Rails engineer, IITan, Mechanical Engineer turned software!!