Oracle 23ai: Vector Search | It’s all about semantics
Oracle just unveiled Oracle Database 23ai which integrates AI capabilities directly into the database engine. AI Vector Search being the standout feature that enables the creation of AI Models, LLMs and RAG Workflows.
Built from the basis of Vector Embeddings, Oracle AI Vector Search allows users to dig deeper into the semantics or meaning of data beyond traditional keyword-based searches, the necessity for vector databases has become more prevalent due to the requirements of new AI workflows to efficiently store, index, and extract meaning from intricate datasets. The true power of Oracle 23ai Vector Search comes from the ability to combine advanced keyword-based data search with Vectors semantic search.
Vector Search in Oracle23ai includes a large array of capabilities to enable fast and precise semantic search and support for the Full Generative AI pipeline.
- New VECTOR data type for storing vectors and vector embeddings.
- Vector indexes for running approximate searches over huge vector spaces.
- SQL Operators to generate vector embeddings.
- SQL Operators and syntax to easily express similarity search.
Buzzwords debuzzed
Before we delve deeper into Oracle Vector Search first we need to explore and understand all these new buzzwords.
- Vectors
A Vector is a multidimensional list of numbers usually generated by deep learning models from Text, Documents and Images. Vectors encode the semantics of the data and much like a fingerprint each vector is unique and contains lots of details of each piece of data.
- Vector Embeddings
Vector embeddings are a way of turning complex objects into numerical vectors that capture their characteristics, and vector databases use these embeddings to efficiently search and retrieve similar or relevant objects based on their positions in the vector space.
- Vector Search
Vector Search is a powerful new technology that performs searches based on the similarity or distance of data represented as multidimensional vectors. Vector search is often considered more effective than keyword-based search, as vector search is based on the meaning and context behind the words and not the actual words themselves.
In this article, we’ll explore Vector Search and introduce the new SQL functions involving vectors.
Creating the Oracle Database 23ai Environment
You can follow my latest article on creating your free Autonomous Database 23ai on OCI, it’s a comprehensive guide on creating an Autonomous Database and accessing it with the SQL Workbench directly on the OCI Console or SQL Developer from your PC.
Another great way to try Oracle Database 23ai is with Oracle LiveLabs, which allows you to create a limited-time workshop reservation completely for free. Each LiveLabs reservation gives you a self-contained environment to run your workshop.
This LiveLabs workshop focuses on Vector Search basics and best of all when provisioned it throws you directly into a remote desktop with your SQL Developer connection already configured.
Log in and click on start and Run on LiveLabs Sandbox.
Wait a few minutes while your environment is being provisioned then click on the top-right menu and My Reservations.
Click on the NoVCN Link to access your new lab environment.
Your Remote Desktop noVNC opens with the workshop notes, this is an Oracle Linux 8 environment with all the necessary tools and configuration to access the Oracle Database 23ai.
Press on the top-left Activities button and enter sql developer in the search box.
SQL Developer will open, after that just right-click on the already created connection and press Connect.
Storing VECTOR type Data
The new vector data type has 2 arguments: Number of dimensions and Dimension formats.
The possible dimension formats are:
- INT8 (8-bit integers)
- FLOAT32 (32-bit IEEE floating-point numbers)
- FLOAT64 (64-bit IEEE floating-point numbers)
Vectors can have an arbitrary number of dimensions or you can create a vector the number of dimensions as a constraint.
CREATE TABLE MY_VECTOR (
v1 VECTOR(10, INT8),
v2 VECTOR(*, FLOAT32),
v3 VECTOR(5, FLOAT64),
v4 VECTOR(*, *),
v5 VECTOR
);
Table created.
DESC MY_VECTOR;
Name Null? Type
--------------------------- -------- ----------------------------
V1 VECTOR(10 , INT8)
V2 VECTOR(* , FLOAT32)
V3 VECTOR(5 , FLOAT64)
V4 VECTOR(* , *)
V5 VECTOR(* , *)
Let’s create a table where we can store the location of all vehicles in a parking lot with the name and the location represented as a Vector.
CREATE TABLE PARKING_LOT (
VEHICLE VARCHAR2(10),
LOCATION VECTOR
);
Now we can insert into the table information of 15 randomly parked vehicles.
INSERT INTO PARKING_LOT VALUES('CAR1','[7,4]');
INSERT INTO PARKING_LOT VALUES('CAR2','[3,5]');
INSERT INTO PARKING_LOT VALUES('CAR3','[6,2]');
INSERT INTO PARKING_LOT VALUES('TRUCK1','[10,7]');
INSERT INTO PARKING_LOT VALUES('TRUCK2','[4,7]');
INSERT INTO PARKING_LOT VALUES('TRUCK3','[2,3]');
INSERT INTO PARKING_LOT VALUES('TRUCK4','[5,6]');
INSERT INTO PARKING_LOT VALUES('BIKE1','[4,1]');
INSERT INTO PARKING_LOT VALUES('BIKE2','[6,5]');
INSERT INTO PARKING_LOT VALUES('BIKE3','[2,6]');
INSERT INTO PARKING_LOT VALUES('BIKE4','[5,8]');
INSERT INTO PARKING_LOT VALUES('SUV1','[8,2]');
INSERT INTO PARKING_LOT VALUES('SUV2','[9,5]');
INSERT INTO PARKING_LOT VALUES('SUV3','[1,2]');
INSERT INTO PARKING_LOT VALUES('SUV4','[5,4]');
The data should look something like this.
Nearest Neighbor Vector Search
Now we can use the VECTOR_DISTANCE function to calculate the closest vehicles parked to a certain vehicle using the Euclidean metric. Euclidean distance reflects the distance between each of the vector’s coordinates being compared.
This query returns the 6 closest vehicles to CAR1.
select
v1.vehicle as vehicle_1,
v2.vehicle as vehicle_2,
VECTOR_DISTANCE(v1.location,v2.location,EUCLIDEAN) as distance
from parking_lot v1, parking_lot v2
where v1.vehicle='CAR1'
order by distance asc
FETCH FIRST 6 ROWS ONLY;
This function returns the 6 closest vehicles to TRUCK4.
select
v1.vehicle as vehicle_1,
v2.vehicle as vehicle_2,
VECTOR_DISTANCE(v1.location,v2.location,EUCLIDEAN) as distance
from parking_lot v1, parking_lot v2
where v1.vehicle='TRUCK4'
order by distance asc
FETCH FIRST 6 ROWS ONLY;
Vector distance metrics in Oracle 23ai
When working with vectors, there are several ways you can calculate distances to determine how similar, or dissimilar, two vectors are. These are the available metrics: COSINE, DOT, EUCLIDEAN, EUCLIDEAN_SQUARED, HAMMING, MANHATTAN. The default distance metric is COSINE.
You can read more on the details of each of these metrics here.
select
v1.vehicle as vehicle_1,
v2.vehicle as vehicle_2,
VECTOR_DISTANCE(v1.location,v2.location,DOT) as distance
from parking_lot v1, parking_lot v2
where v1.VEHICLE='TRUCK4'
order by distance asc
FETCH FIRST 6 ROWS ONLY;
There are also new funky SQL Operators for distances, I really like the notation and it’s simple but I can wrap my head around new SQL syntax in 2024.
- <-> for Euclidian distance.
- <=> for Cosine distance.
- <#> for negative dot product operator.
select
v1.vehicle as vehicle_1,
v2.vehicle as vehicle_2,
v1.location <-> v2.location as distance
from parking_lot v1, parking_lot v2
where v1.vehicle='TRUCK4'
order by distance asc
fetch first 6 rows only;
Resources
Be sure to check out the vector search User’s Guide documentation: Oracle AI Vector Search User’s Guide.
Oracle University LiveLabs: https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=1070&clear=RR,180&session=114562356497783
How to transform real data into vectors
The vectors explored in this article simply represent the coordinates of entities in a cartesian plane and not real data, in order to transform contextual data to vectors we have to use Vector Embeddings models to transform data into large multidimensional vector embeddings which contains semantics and characteristics of the data.
In my next article we’ll explore how to create vector embeddings directly from the database or using Oracle Generative AI API.