MySQL as smart JSON storage
I’ve been using this semi-relational, or SemiSQL, if you’d like, approach to SQL since around 2009. At first, I’d store the data as text
or longtext
types using PHP’s serialize()
, but then I quickly moved to JSON
.
In late 2015, with MySQL 5.7.8, our prayers were answered and the native support for JSON
datatype was introduced. 🙏
Let’s have a look at the best practices using JSON in MySQL 8.
SemiSQL vs NoSQL
For most cases, I’d recommend Mongo DB as NoSQL storage for JSON data. It’s a great and scalable database system featuring plenty of filtering, sorting, indexing, projecting and even aggregating functions, which can supply for basic relations.
However, if your project uses heavy relational logic, MySQL with SemiSQL approach may suit you better. Or maybe your project already uses MySQL and switching databases is simply too much work.
With MySQL’s JSON
datatype and plenty of JSON functions
, you gain advantage using both non-relational (NoSQL) and relational (SQL) database features where needed. All without the need to maintain the data integrity manually 🙏. At least for 1:1 relations.
Creating JSON storage
Creating a key-value table for JSON storage is simple
CREATE TABLE listings (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
data JSON NOT NULL
);INSERT INTO listings (data) VALUES
('{
"title": "First Listing",
"public": false,
"text": "Lorem ipsum",
"address": "Klimentská 1215/25, 110 00, Prague 1"
}'),
('{
"title": "Second Listing",
"public": true,
"text": "Dolor sit amet",
"address": "Rybná 716/24, 110 00, Prague 1"
}');
Read the full data
SELECT id, data FROM listings;1 {"text": "Lorem ipsum", "title": "First Listing", "public": false, "address": "Klimentská 1215/25, 110 00, Prague 1"}2 {"text": "Dolor sit amet", "title": "Second Listing", "public": true, "address": "Rybná 716/24, 110 00, Prague 1"}
Read partial data, preserving JSON types
SELECT id, data->'$.title' as 'title', data->'$.public' as 'public'
FROM listings;1 "First Listing" false (bool)
2 "Second Listing" true (bool)
Read partial data, unquoting JSON types
SELECT
id,
data->>'$.title' 'title',
data->>'$.public' 'public'
FROM listings;1 First Listing false (string)
2 Second Listing true (string)
These ->
and ->>
are shortcuts for JSON_EXTRACT(column, path)
and JSON_UNQUOTE(JSON_EXTRACT(column, path))
functions respectively.
I do not recommend to unquote other types than strings, especially booleans, as this would break logical operations.
SELECT
id,
data->>'$.title' 'title',
data->'$.public' 'public'
FROM listings;1 First Listing false (bool)
2 Second Listing true (bool)
Updating data
You can update the whole data object simply via
UPDATE listings SET data = '{ ... }' WHERE id = 1;
Or update single keys via
UPDATE listings
SET data = JSON_SET(data, '$.address', CAST('{
"street": "Klimentská 1215/25",
"city": "Prague 1",
"zip": "CZ-11000",
"gps": [50.0920, 14.4304]
}' AS JSON))
WHERE id = 1;
You can also use JSON data for filters, like this:
UPDATE listings
SET data = JSON_SET(data, '$.address', CAST('{
"street": "Rybná 716/24",
"city": "Prague 1",
"zip": "CZ-11000",
"gps": [50.0903, 14.4260]
}' AS JSON))
WHERE data->>'$.title' = 'Second Listing';
For multi–row update, I prefer to
# for updating the whole data packagesINSERT INTO listings (id, data) VALUES
(1, '{ ... }'),
(2, '{ ... }')
ON DUPLICATE KEY UPDATE data = VALUES(data);# for updating only specified keys
# unfortunately, this breaks on ACID checks, as described belowINSERT INTO listings (id, data) VALUES
(1, '{ "public": false }'),
(2, '{ "public": true, "reserved": false }')
ON DUPLICATE KEY UPDATE data = JSON_MERGE_PATCH(data, VALUES(data));
Indexes and virtual (generated) columns
The true beauty of MySQL’s JSON functions lies with generated columns.
ALTER TABLE listingsADD title VARCHAR(255)
AS (data->>'$.title') STORED
AFTER id,ADD public BIT
AS (data->'$.public' = true) STORED
AFTER title,ADD country VARCHAR(2)
AS (SUBSTRING(data->>'$.address.zip', 1, 2)) STORED
NOT NULL
AFTER public,ADD city VARCHAR(15)
AS (data->>'$.address.city') STORED
NOT NULL
AFTER country,ADD gps POINT
AS (POINT(data->'$.address.gps[0]', data->'$.address.gps[1]')) STORED
NOT NULL
AFTER city;
NOT NULL columns will also ACID–check JSON’s integrity 👍
Now you can select data like this
SELECT id, title, data->>'$.text' 'text', country, city
FROM listings
WHERE public AND title LIKE 'Second%';2 Second Listing Dolor sit amet CZ Prague 1
Indexes
Always a good idea to add indexes, even redundant ones. Especially if your table hits considerable more reads than writes, what, in most cases, happen.
ALTER TABLE listings
ADD INDEX (public DESC,country),
ADD INDEX (public DESC),
ADD INDEX (country),
ADD INDEX (city),
ADD INDEX (title),
ADD SPATIAL (gps);
All indexes require GENERATED
colums to be STORED
, as in not VIRTUAL
. Also, SPATIAL
index only supports NON NULL
columns.
Projection columns
List–view projection
Let’s say your JSON
object in data
column is heavier than that. Let’s say:
UPDATE listings
SET data = JSON_SET(data,
'$.contact', CAST('{
"name": "Lorem ipsum",
"phone": "+000000000000",
"email": "lorem@ipsum.com"
}' AS JSON),
'$.photos', CAST('[
{"url": "...", "width": 3240, "height": 1620},
{"url": "...", "width": 2880, "height": 1620},
{"url": "...", "width": 1620, "height": 3240}
]' AS JSON)
# plus plenty of other data
)
WHERE id = 2;
You’d want to load the data
column for viewing the entry’s detail, but for list views, it’s better not to load the whole big packages.
Let’s make a projection for what you need only, using a VIRTUAL
column.
ALTER TABLE listingsADD listing JSON AS
(JSON_OBJECT(
'title', data->'$.title',
'photo', data->'$.photos[0]',
'address', CONCAT_WS(', ',
data->>'$.address.street',
data->>'$.address.city',
SUBSTRING(data->>'$.address.zip', 1, 2)
),
'gps', data->'$.address.gps'
))
VIRTUAL AFTER gps;
Virtual columns are not stored, obviously, and are only generated when selected.
SELECT listing
FROM listings
WHERE public;2 {
"gps": [50.0903, 14.426],
"photo": {"url": "...", "width": 3240, "height": 1620},
"title": "Second Listing",
"address": "Klimentská 1215/25, Prague 1, CZ"
}
Fulltext projections
Let’s extract a fulltext string from data
and store them for fulltext index.
ALTER TABLE listingsADD ft VARCHAR(255) AS
(IF(public, CONCAT_WS(' ',
data->'$.title',
data->'$.text',
JSON_EXTRACT(data->'$.address', '$.*'),
JSON_EXTRACT(data->'$.contact', '$.*')
), NULL))
STORED AFTER listing,ADD FULLTEXT (ft);
Now you can perform fulltext–optimized search like this:
SELECT id, listing
FROM listings
WHERE MATCH(ft) AGAINST('Kliment*' IN BOOLEAN MODE);2 {
"gps": [50.0903, 14.426],
"photo": {"url": "...", "width": 3240, "height": 1620},
"title": "Second Listing",
"address": "Klimentská 1215/25, Prague 1, CZ"
}
Datetime and timestamp
These are not generated columns
in true sense of the phrase, but I take them for good praxis anyways.
ALTER TABLE listingsADD dt DATETIME
DEFAULT CURRENT_TIMESTAMP
AFTER data,ADD INDEX (dt DESC), # optionally ADD ts TIMESTAMP
DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP
AFTER dt;
Column dt
keeps the row’s insert datetime
, while ts
shows the row’s last update timestamp
.
Overall table structure
If TLDR, the full structure of a smart JSON-based table for semi-relational purposes would look like this
CREATE TABLE listings (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) AS
(data->>'$.title') STORED,
public BIT AS
(data->'$.public' = true) STORED,
country VARCHAR(2) AS
(SUBSTRING(data->>'$.address.zip', 1, 2)) STORED NOT NULL,
city VARCHAR(15) AS
(data->>'$.address.city') STORED NOT NULL,
gps POINT AS
(POINT(data->'$.address.gps[0]', data->'$.address.gps[1]'))
STORED NOT NULL,
listing JSON AS
(JSON_OBJECT(
'title', data->'$.title',
'photo', data->'$.photos[0]',
'address', CONCAT_WS(', ',
data->>'$.address.street',
data->>'$.address.city',
SUBSTRING(data->>'$.address.zip', 1, 2)
),
'gps', data->'$.address.gps'
)) VIRTUAL,
ft VARCHAR(255) AS
(IF(public, CONCAT_WS(' ',
data->'$.title',
data->'$.text',
JSON_EXTRACT(data->'$.address', '$.*'),
JSON_EXTRACT(data->'$.contact', '$.*')
), NULL)) STORED,
data JSON NOT NULL,
dt DATETIME
DEFAULT CURRENT_TIMESTAMP NOT NULL,
ts TIMESTAMP
DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP NOT NULL,
KEY (public DESC, country),
KEY (public DESC),
KEY (country),
KEY (city),
KEY (title),
SPATIAL KEY (gps),
KEY (dt DESC),
FULLTEXT KEY (ft)
);
Quirks to watch for
Retyping
Casting JSON types into MySQL types can be tricky from time to time. Especially using ->> (JSON_UNQUOTE)
. In example, booleans go like this
{"public": true} SELECT data->'$.public' true
{"public": "true"} SELECT data->'$.public' "true"
{"public": "true"} SELECT data->>'$.public' true
{"public": true} SELECT (data->'$.public' = true) 1
{"public": true} SELECT (data->>'$.public' = true) 0
{"public": "true"} SELECT (data->>'$.public' = true) 0
Combined use of bits
(1, 0)
and booleans(true, false)
for logical operations might lead to unexpected results.
Order inside JSON objects
Will not be preserved.{ "title": ... , "public": ... , "text": ... }
will become { "text": ... , "title": ... , "public": ... }
If I got you worried now, order in arrays will be preserved, of course [5, 2, 1]
will be [5, 2, 1]
no matter what 😇
Not a big deal, but better to keep this in mind.