Two minutes NLP — Sentence Transformers cheat sheet

Sentence Embeddings, Text Similarity, Semantic Search, and Image Search

Published in

NLPlanet

4 min readJan 10, 2022

Use-cases of the SentenceTransformers library. Image by the author.

SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings. Embeddings can be computed for 100+ languages and they can be easily used for common tasks like semantic text similarity, semantic search, and paraphrase mining.

The framework is based on PyTorch and Transformers and offers a large collection of pre-trained models tuned for various tasks. Further, it is easy to fine-tune your own models.

Read the paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks for a deep dive into how the models have been trained. In this article, we’ll see code examples for some of the possible use-cases of the library. Model training will be covered in a later article.

Library installation

Before delving into code, install the SentenceTransformers library with pip.

Get sentence embeddings

The first example we see is how to obtain sentence embeddings. SentenceTransformers makes it as easy as pie: you need to import the library, load a model, and call its encode method on the sentences that you want to encode.

Semantic Textual Similarity

Once that we have the embeddings of our sentences, we can compute their cosine similarity using the cos_sim function from the util module.

Semantic Search

Semantic search seeks to improve search accuracy by understanding the content of the search query instead of relying on lexical matching only. This is done leveraging similarities between embeddings.

The idea behind semantic search is to embed all entries in your corpus into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found.

Example of Semantic Search in a vector space. Image from https://www.sbert.net.

Semantic Search can be performed using the semantic_search function of the util module, which works on the embeddings of the documents in a corpus and on the embeddings of the queries.

In order to get the best out of semantic search, you must distinguish between symmetric and asymmetric semantic search, as it heavily influences the choice of the model to use.

Paraphrase Mining

Paraphrase mining is the task of finding paraphrases, i.e. texts with very similar meaning, in a large corpus of sentences.

This can be achieved using the paraphrase_mining function of the util module.

Image Search

SentenceTransformers provides models that allow embedding images and text into the same vector space: this allows to find similar images as well as to implement Image Search, i.e. using text to search for images and vice-versa.