How to improve searches on JSON data inside SQL Server

Gonçalo Melo
Jun 5, 2020 · 4 min read

Query JSON objects inside SQL Server using JSON_VALUE function can have significant performance improvements. Find how in this step by step guide.

Starting SQL Server 2016 Microsoft deployed a set of functions that allow us to work with JSON data in a structured way inside SQL Server.

I will introduce a small usage sample for the JSON_VALUE function in combination with indexes to improve information retrieval from a table containing one JSON object. For our tests, we have a UserDetailTest table that has more than 500k rows with 2 columns: an UserId and a nvarchar(max) to hold a small JSON details string like the following:

SELECT *, LEN(DetailsJSON) AS [Len(DetailsJSON)] FROM UserDetailTest
Image for post
Image for post

I will activate time statistics and clean SQL Server cache between each query to have some consistency across the execution times using the following SQL statements:

CHECKPOINT
DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON

JSON_VALUE Function intro

The JSON_VALUE function extracts a value from a JSON string. This functions receives 2 arguments, the first being an expression for the JSON value and the second a path for the value we want to obtain. A simple sample with an inline JSON object would be the following:

SELECT JSON_VALUE(‘{“PostalCode”:”376–3765",”PhoneNumber”:”351003765718"}’, ‘$.PhoneNumber’)
Image for post
Image for post

The return is a scalar value (nvarchar(4000)) for the PhoneNumber element.

Query Scenarios

Starting with one of the really worst case scenario (yes, this can happen…):

Image for post
Image for post

More than 17 seconds really seems like a worst scenario! Next, we will use the JSON_VALUE function to get the PhoneNumber from the JSON string and use it in our where clause:

Image for post
Image for post

This still takes more than 3 seconds. A good improvement, but yet a high cost if we need to search information in this way.

Index Creation

Let’s add a new virtual column to table that displays the result from the JSON_VALUE function — this will allow us to create an index and simplify the SELECT queries.

ALTER TABLE UserDetailTest 
ADD vPhoneNumber AS
CAST(JSON_VALUE(DetailsJSON, ‘$.PhoneNumber’) AS NVARCHAR(255));

The cast truncates the output from the JSON_VALUE to ensure that it does not exceed the maximum lenght for the index key. Now, if we search using the column, the result is still more or less the same 3 seconds has before:

Image for post
Image for post

Let’s create an index and perform the search again:

CREATE INDEX idx_vPhoneNumber ON [UserDetailTest] (vPhoneNumber);
Image for post
Image for post

Consequently the improvement is huge, 13 milliseconds is more interesting! But what’s the cost?

The Cost

When we create an index, we’re basically trading space for time — more occupied space versus faster operations. Therefore let’s see what the increase is using sp_spaceused function before and after index creation:

Image for post
Image for post

Probably, an interesting comparison would be if the PhoneNumber column was an explicit column in that table containing the value. Let’s do that!

ALTER TABLE UserDetailTest 
ADD PhoneNumberCopy NVARCHAR(255)

UPDATE UserDetailTest
SET PhoneNumberCopy = vPhoneNumber

The sp_spaceused remained the same — SQL Server internal black magic regarding space allocation (for another time!). But performance wise, for this approach in my machine this query still took more than 2 seconds to complete:

Image for post
Image for post

Slightly better than the query with the JSON function but the advantage is that in this scenario SQL can better optimize the searches. The following runs of the same query without clearing the cache returns results much faster — each took around 220 miliseconds to complete:

Image for post
Image for post

Summary Results

Just to sum up all the results for the different types of searches comparing the first execution after cache clean up and the following runs. I did several iterations for each step just to make sure the results were consistent although the goal was just to have a baseline. Certainly, the use of this will depend on each case.

Image for post
Image for post

The Startup

Medium's largest active publication, followed by +755K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store