Quick tip: SingleStoreDB’s EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK functions

Akmal Chaudhri
3 min readApr 3, 2022

Abstract

SingleStoreDB supports many useful vector functions. These functions provide considerable power and functionality and can be executed directly in the database system, which we saw first-hand in a previous article. In that previous article, we used the DOT_PRODUCT function as part of a solution for building a movie recommender system. In this article, we’ll use the EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK functions to help identify different species of flowers using the Iris flower data set.

The SQL code used in this article is available in a GitHub Gist.

Introduction

Euclidean distance can be used to measure the similarity of objects. This could be particularly useful in image recognition, for example. However, let’s start with the more simple problem of identifying different species of flowers. We’ll create a SingleStoreDB database and table to store our flower data set. We’ll store the data using JSON_ARRAY_PACK and then formulate several queries to test the EUCLIDEAN_DISTANCE.

Create a SingleStore Cloud Account

First, we’ll create a free Cloud account on the SingleStore website. At the time of writing, the Cloud account from SingleStore comes with $500 of Credits. A previous article showed how to create a SingleStoreDB Cloud cluster. We’ll use those instructions for this use case.

Create Database Table

In our SingleStore Cloud account, let’s use the SQL Editor to create a new database. Call this iris_db, as follows:

CREATE DATABASE IF NOT EXISTS iris_db;

We’ll also create the iris table, as follows:

USE iris_db;

CREATE TABLE IF NOT EXISTS iris (
vector BLOB,
species VARCHAR(20)
);

The Iris data set contains 150 rows of data for three different species of flowers. For each flower, there are four columns of data: sepal_length, sepal_width, petal_length and petal_width. We can store these four column values together using JSON_ARRAY_PACK in a BLOB format, as follows:

INSERT INTO iris VALUES
(JSON_ARRAY_PACK('[5.1,3.5,1.4,0.2]'),'Iris-setosa'),
(JSON_ARRAY_PACK('[4.9,3,1.4,0.2]'),'Iris-setosa'),
(JSON_ARRAY_PACK('[4.7,3.2,1.3,0.2]'),'Iris-setosa'),
(JSON_ARRAY_PACK('[4.6,3.1,1.5,0.2]'),'Iris-setosa'),
(JSON_ARRAY_PACK('[5,3.6,1.4,0.2]'),'Iris-setosa'),
...
(JSON_ARRAY_PACK('[6.7,3,5.2,2.3]'),'Iris-virginica'),
(JSON_ARRAY_PACK('[6.3,2.5,5,1.9]'),'Iris-virginica'),
(JSON_ARRAY_PACK('[6.5,3,5.2,2]'),'Iris-virginica'),
(JSON_ARRAY_PACK('[6.2,3.4,5.4,2.3]'),'Iris-virginica'),
(JSON_ARRAY_PACK('[5.9,3,5.1,1.8]'),'Iris-virginica');

Only the first five and last five rows are shown above. The complete code listing is available in a GitHub Gist.

Example Queries

Query 1

First, let’s try a query where we want to find the name of the flower species using an exact match for the sepal_length, sepal_width, petal_length and petal_width. We’ll use the values from the very last row input into the iris table:

SELECT species
FROM iris
WHERE EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.9,3,5.1,1.8]')) = 0;

The result should be:

+----------------+
| species |
+----------------+
| Iris-virginica |
+----------------+

Query 2

Next, let’s also find other nearby flowers:

SELECT EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.9,3,5.1,1.8]')) AS euclidean_distance, species
FROM iris
ORDER BY euclidean_distance
LIMIT 5;

The result should be:

+---------------------+----------------+
| euclidean_distance | species |
+---------------------+----------------+
| 0 | Iris-virginica |
| 0.28284244589567653 | Iris-virginica |
| 0.31622746208231284 | Iris-virginica |
| 0.3316624219760969 | Iris-virginica |
| 0.3316624219760969 | Iris-virginica |
+---------------------+----------------+

Query 3

Now let’s use some fictitious data values to make a prediction:

SELECT EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.2,3.6,1.5,0.3]')) AS euclidean_distance, species
FROM iris
ORDER BY euclidean_distance
LIMIT 5;

The result should be:

+---------------------+-------------+
| euclidean_distance | species |
+---------------------+-------------+
| 0.14142129538778386 | Iris-setosa |
| 0.1732049874122573 | Iris-setosa |
| 0.17320510570613526 | Iris-setosa |
| 0.17320538530952567 | Iris-setosa |
| 0.19999992325900512 | Iris-setosa |
+---------------------+-------------+

Query 4

We could also narrow the result, as follows:

SELECT species
FROM iris
ORDER BY EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.2,3.6,1.5,0.3]'))
LIMIT 1;

The result should be:

+-------------+
| species |
+-------------+
| Iris-setosa |
+-------------+

The species name agrees with the result obtained using the same data in a previous article.

Summary

This article quickly showed two more of SingleStoreDB’s vector functions. Although our example was quite simple, it showed the power and functionality built into SingleStoreDB.

--

--

Akmal Chaudhri

I help build global developer communities and raise awareness of technology through presentations and technical writing.