Simplifying the Vector Database

Ryan Siegler
KX Systems
Published in
3 min readJan 5, 2024

A passion for emerging technologies in the generative AI space leads to being involved in an exciting and fast moving community. Unfortunately, outside of this community, most folks do not have time to learn the ins and outs of these technologies and therefore might not know the powerful use-cases that can be solved using them. Vector databases are no exception, outside of the vector database community people don’t often know what a vector database is or its uses. Let’s simplify it:

A vector database holds vectors.

A vector is a mathematical object with a direction and magnitude. Imagine an arrow pointing through space. They can be high dimensional, which means they hold many numbers to represent direction and magnitude, sometimes even thousands!

2D Vector = (1,1)
3D Vector = (1,1,1)
High Dimensional Vector = (-4,3,5,4,…..,10)

Why does this matter?

Since vectors can hold many properties (many dimensions), it means they can hold complex numerical representations of different types of data including text, images, video, and audio.

To illustrate, a piece of text can be represented as a vector of numerical values by a process called embedding, which translates the words to numbers. This vector, representing the text’s meaning, is stored in the vector database. For instance, “The Cat in the Hat” becomes something like (-4,3,5,4,…,10).

All vectors are attached to an ‘index’ within the vector database, which is a roadmap showing where there vectors are located. The index helps to make searching the vector database more efficient.

Key Point 🗝: Vectors represent the original data in meaning and context. They are rich in information and are numerical. We can use this to our advantage to understand how different data points (vectors) relate to one another.

What can I do with a Vector Database?

Similarity Search: Since your data is represented as mathematical vectors, you can find the most similar vectors by comparing them using well established, very fast and efficient similarity methods (Euclidean Distance, Cosine Similarity, Dot Product). This can even be performed between different data types (multi-modal) like finding the most similar image to a given phrase of text.

Enterprise Search: Quickly find semantically relevant documents and articles within your organization.

Image Search: Identify similar images, useful in healthcare diagnosis & image recognition.

Recommendation Systems: Find relevant interests, products, movies/shows, advertisements.

Pattern Matching: Find similar patterns, analysis & prediction of time-series data like stock market data, sensor data, or weather data.

Sentiment Analysis: Understand the relationship between users, products, and reviews.

Anomaly Detection: Find dissimilar items, identify unexpected deviations, understand when fraud could be happening.

Retrieval Augmented Generation (RAG): Integrate relevant data in your vector database with large language models to take advantage of the power of generative AI on your own data.

Thank you for reading, hopefully this gave you a greater understanding of vector databases and their associated use-cases.

Please connect with me on LinkedIn if you are interested in learning more!

--

--

Ryan Siegler
KX Systems

GenAI | Vector Databases | Data Science | Emerging Technology Advocate