How to improve searches on JSON data inside SQL Server
Query JSON objects inside SQL Server using JSON_VALUE function can have significant performance improvements. Find how in this step by step guide.
Starting SQL Server 2016 Microsoft deployed a set of functions that allow us to work with JSON data in a structured way inside SQL Server.
I will introduce a small usage sample for the JSON_VALUE function in combination with indexes to improve information retrieval from a table containing one JSON object. For our tests, we have a UserDetailTest table that has more than 500k rows with 2 columns: an UserId and a nvarchar(max) to hold a small JSON details string like the following:
SELECT *, LEN(DetailsJSON) AS [Len(DetailsJSON)] FROM UserDetailTest

I will activate time statistics and clean SQL Server cache between each query to have some consistency across the execution times using the following SQL statements:
CHECKPOINT
DBCC DROPCLEANBUFFERS
SET STATISTICS TIME ON
JSON_VALUE Function intro
The JSON_VALUE function extracts a value from a JSON string. This functions receives 2 arguments, the first being an expression for the JSON value and the second a path for the value we want to obtain. A simple sample with an inline JSON object would be the following:
SELECT JSON_VALUE(‘{“PostalCode”:”376–3765",”PhoneNumber”:”351003765718"}’, ‘$.PhoneNumber’)

The return is a scalar value (nvarchar(4000)) for the PhoneNumber element.
Query Scenarios
Starting with one of the really worst case scenario (yes, this can happen…):

More than 17 seconds really seems like a worst scenario! Next, we will use the JSON_VALUE function to get the PhoneNumber from the JSON string and use it in our where clause:

This still takes more than 3 seconds. A good improvement, but yet a high cost if we need to search information in this way.
Index Creation
Let’s add a new virtual column to table that displays the result from the JSON_VALUE function — this will allow us to create an index and simplify the SELECT queries.
ALTER TABLE UserDetailTest
ADD vPhoneNumber AS
CAST(JSON_VALUE(DetailsJSON, ‘$.PhoneNumber’) AS NVARCHAR(255));
The cast truncates the output from the JSON_VALUE to ensure that it does not exceed the maximum lenght for the index key. Now, if we search using the column, the result is still more or less the same 3 seconds has before:

Let’s create an index and perform the search again:
CREATE INDEX idx_vPhoneNumber ON [UserDetailTest] (vPhoneNumber);

Consequently the improvement is huge, 13 milliseconds is more interesting! But what’s the cost?
The Cost
When we create an index, we’re basically trading space for time — more occupied space versus faster operations. Therefore let’s see what the increase is using sp_spaceused function before and after index creation:

Probably, an interesting comparison would be if the PhoneNumber column was an explicit column in that table containing the value. Let’s do that!
ALTER TABLE UserDetailTest
ADD PhoneNumberCopy NVARCHAR(255)
UPDATE UserDetailTest
SET PhoneNumberCopy = vPhoneNumber
The sp_spaceused remained the same — SQL Server internal black magic regarding space allocation (for another time!). But performance wise, for this approach in my machine this query still took more than 2 seconds to complete:

Slightly better than the query with the JSON function but the advantage is that in this scenario SQL can better optimize the searches. The following runs of the same query without clearing the cache returns results much faster — each took around 220 miliseconds to complete:

Summary Results
Just to sum up all the results for the different types of searches comparing the first execution after cache clean up and the following runs. I did several iterations for each step just to make sure the results were consistent although the goal was just to have a baseline. Certainly, the use of this will depend on each case.
