Matryoshka Embedding Model for Ordinary People

Published in

Mad Chatter Tea Party

4 min readMay 9, 2024

Image generated using https://huggingface.co/spaces/jbilcke-hf/ai-comic-factory

A few months ago, after years of inaction, Open AI released its new embedding models text-embedding-3-*. Among the new features of these models is the fact that they use Matryoshka Representation Learning, but let’s summarize the previous episodes.

What are Embeddings?

An embedding is a numerical representation about real-world entities (image, word, text, node or relation into a graph, …).
The entities are mapped as vectors within a high-dimensional space.

In the article when we say embeddings we are referring to sentence embeddings.

Image from https://huggingface.co/blog/matryoshka

Embeddings are used in so many tasks, they are the building blocks on which much of modern deep learning is based.
With the explosion of LLMs, embedding models play a crucial role in the RAG, in fact through the comparison of vectors (queries — documents) we extract the context to be inserted in the prompts.

Wanting to be more and more precise and trying to map more and more nuances of language has resulted in huge vectors (i.e. jina-embeddings-v2-base-en is an English, monolingual embedding model supporting 8192 sequence length) that can be difficult to handle in the real world applications.

If you want to go in deep with embeddings check the article on Cheshire Cat Blog

Matryoshka Embedding

Matryoshka Representation Learning (MRL) is an advanced machine learning approach that encodes data at multiple levels of granularity within a single vector representation.
Like a Matryoshka doll the levels are nested in one embedding, the greater the number of levels, the more detail the embedding maps.
the interesting thing is that not all applications need all levels, so you can start with an embedding with, say, 64 dimensions and gradually grow as you experiment if it does not perform.
These adaptations make our applications efficient because by reducing the size of the vector space, we require less memory in storage and less computing power when comparing vectors.

If you are interested in how to optimize vector space, check out this article.

Practical explanation

Imagine you’re on Spotify:

at the first level of search you find the genres of music (classical, pop, rock, …)
you’re an indie music fan and you click on the genre, you have the sub-styles, this is more detail
after the substyle you can choose another detail, you’re an old bad boy who grew up on Artic Monkeys so you go for 10s indie
now you can choose a playlist (less details) or search you favorite band (more detail), and then a playlist (less details) or a specific album/genre songs (more details)?

I hope this analogy was clear enough about how a matryoshka embedding works.

Matryoshka Embeddings on Cheshire Cat

These types of embeddings are not in the Cheshire Cat core but can be used via plugins:

For OpenAI and AzureOpenAI after installing the plugin the procedure is the same as when you set the embedder in the core, you just have to set the size based on the OpenAI documentation.

Image from https://openai.com/index/new-embedding-models-and-api-updates/

For SBERT, after installing the plugin, you must first check if the embedder you want to use supports Matryoshka representation, go to the gear/settings button.

It opens a setting tab, you can set your favorite dimension.

Now go in Settings and set your SBERT Matryoska embedder.

You can choose your Cache Folder (or leave blank for default) but you MUST leave Model Kwargs blank! Model Kwargs are populated by the Plugin Settings!

I hope you enjoyed this article, have fun with Cheshire Cat!

Matryoshka Embedding Model for Ordinary People

What are Embeddings?

Matryoshka Embedding

Practical explanation

Matryoshka Embeddings on Cheshire Cat

Written by Nicola Procopio