Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Working with Retrieval

Working with Embeddings: Closed versus Open Source

Using techniques to improve semantic search

17 min readSep 26, 2024

--

Demonstration of clustering before performing semantic search | Image by author

If you’re not a member but want to read this article, see this friend link here.

Embeddings are a cornerstone of natural language processing. You can do quite a lot with embeddings, but one of the more popular uses is semantic search used in retrieval applications.

Although the entire tech community is abuzz with understanding how knowledge graph retrieval pipelines work, using standard vector retrieval isn’t out of style.

You’ll find multiple articles showing you how to filter out irrelevant results from semantic searches, something we’ll also be focusing on here using techniques such as clustering and re-ranking.

The main focus of this article, though, is to compare open source and closed source embedding models of various sizes.

The models that will be in focus — there are many more available | Image by author

We will compare up to 9 different embedding models that are high on the MTEB leaderboard. This will give you an idea of how a large versus a small model can perform and what the costs would be as you…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (9)