king – man + woman = queen

Tim J
3 min readApr 11, 2024

Have you ever wondered how the internet can find exactly what you’re looking for, even when you don’t use the exact words? It’s like asking for a “movie about space wars,” and you get results for “Star Wars.” This magic happens thanks to something called Vector Databases.

Imagine you’re playing a game of charades, and you have to guess the word “earth.” Your friend might says words like “planet,” “globe,” or “world.” Your brain quickly understands that all these words are related. Vector Databases do something similar, but with words and phrases and they are not just storing the word itself, but what it means and how it relates to other words and ideas.

Embeddings: Words Turned Numbers

The key to this lies in something called embeddings. Imagine we’re trying to represent the essence of words like “happy,” “joyful,” and “sad.” In a simplified 3-dimensional model, we might consider dimensions like emotional valence (positive to negative), intensity (calm to intense), and sociality (solitary to social). Here’s a breakdown:

- Happy might translate to [1, 0.5, 0.8], signaling it’s positive, somewhat calm, and fairly social.
- Joyful steps up the intensity and sociality, landing at [1, 0.8, 1].
- Sad, on the other hand, flips the script, embodying negativity, intensity, and solitude with [-1, -0.7, -0.5].

This numerical representation helps computers grasp the (small) relationships between these words, understanding that “happy” and “joyful” share closer ties than either does with “sad.”

The Impact of Vector Databases

By mapping words into a numeric space, Vector Databases make our digital searches smarter and more intuitive. They’re the reason a search for “space wars” might lead you to “Star Wars,” or why looking up “female ruler” can bring you insights about queens and princesses without needing those exact terms.

Imagine this technology in action using a tool like GeoGebra to visualize the relationships between seemingly disparate concepts — “Queen,” “Female Ruler,” “Princess,” “Middle age,” “ChatGPT,” and “Plasma.” Each concept occupies its unique position in a digital landscape, showing how they relate to or differ from one another.

Link: A example for embedding vectors

Arithmetic

In the fascinating world of vector semantics, there’s an intriguing concept that can be summarized by the equation

king — man + woman = queen

This demonstrates the ability of vector databases to understand and manipulate relationships and attributes associated with words in a mathematical space. Essentially, if you take the vector representation of “king,” subtract the vector associated with “man,” and then add the vector for “woman,” the resulting vector would closely align with the vector for “queen.” This operation showcases how vector representations capture not just the meanings of individual words but also the relationships between them. It highlights the power of embeddings to encode semantic differences and similarities, allowing for a kind of algebra with words. This method reveals the underlying structure in language that relates gender roles and titles, offering a glimpse into how machines can grasp complex human concepts through mathematics.

Example Databases

--

--

Tim J

Software Developer in 🇩🇪 Bachelor Student of Business Informatics