Data Science Collective

Advice, insights, and ideas from the Medium data science community

Member-only story

Vector Databases and Search By Similarity for NLP

--

Learn about vector databases and how they can help your data science projects.

Vector Databases | Image generated by AI. Gemini 3. Google, 2025. https://gemini.google.com

Introduction

When working with Natural Language Processing (NLP), you will certainly deal with vector databases. Since I started studying LLMs and how they work under the hood, vector DBs keep popping on my screen.

Stepping back a little, let’s agree that there are other types of databases, such as relational DBs, that store data structured in rows and columns — also known as rectangular form, and are manipulated by query language, using exact match, logical condition, and aggregations to return results.

There is also the No-SQL type, which stores semi or non-structured data where each observation is a document. They can be optimized for documents like JSON, and graphs.

Now, getting back to our point: vector databases.

Imagine a library where books are organized by their meaning, not just by title or author. That’s essentially how vector databases work. In these databases, the data is stored as a numerical representation — a vector — that captures the essence of information. These vectors are called embeddings and are stored and organized based on their similarity.

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Gustavo R Santos
Gustavo R Santos

Written by Gustavo R Santos

Data Scientist | I solve business challenges through the power of data. | Visit my site: https://gustavorsantos.me

No responses yet