# Vector Embeddings in RAG Applications

Published in

--

Till now we have covered the Preprocessing of Data, Data Chunking techniques and also what is vector database. Now, let’s talk about semantic search. But first, we need to understand vector embeddings, the key to making it work.

Vector embeddings may sound complex, but they’re simply numeric representations of data that capture important features and relationships. Let’s dive into the world of vector embeddings to understand how they work and why they’re essential.

## Understanding Vector Embeddings

Vector embeddings are like secret codes that machines use to understand data better. They are made up of numbers and act as a sort of language that machines speak. For example, imagine we want to teach a computer what a bunny and a rabbit are. Even though the words look different, they mean something similar. Vector embeddings help the computer understand this similarity by representing bunny and rabbit as numbers.

To create vector embeddings, we use fancy math and machine learning. These tools take words or phrases and turn them into long lists of numbers. Each number in the list represents a different aspect of the word’s meaning. So, for bunny and rabbit, even though they’re different words, their lists of numbers will look quite similar because they mean similar things. Along with this we also mapped their features or properties with numerical values

## Understand with Example

A few years ago, there was a big deal about how words could be represented by numbers in a way that let us do math with them.

“king − man + woman ≈ queen”

Like, if you take the idea of “king” minus “man” plus “woman” you get something close to “queen.” It’s like saying there’s some kind of “royalty” stuff that’s the difference between “king” and “man,” and that same stuff makes “queen” when you add “woman” to it.

He showed that all words have this one part in the number thing that’s kind of dark blue, and “water” looks really different because it’s not a person. Also, “girl” and “boy” seem more similar to each other than to “king” and “queen,” and “king” and “queen” look similar to each other.

So these number things for words match up with what we already know about their meanings. And it’s not just words that can be represented this way — you can do it with lots of stuff like pictures, sounds, even things like 3D models or molecules.

Embedding can be done of different kinds of data, like text, images, videos, users, music, whatever–as points in space where the locations of those points in space are semantically meaningful.

The best way to intuitively understand what this means is by example, so let’s take a look at one of the most famous embeddings, Word2Vec.

Word2Vec (short for word to vector) was a technique invented by Google in 2013 for embedding words. It takes as input a word and spits out an n-dimensional coordinate (or “vector”) so that when you plot these word vectors in space, synonyms cluster. Here’s a visual: