Member-only story
NLP For Semantic Search
Training Sentence Transformers with Softmax Loss
How the original sentence transformer (SBERT) was built
Search is entering a golden age. Thanks to “sentence embeddings” and specially trained models called“sentence transformers” we can now search for information using concepts rather than keyword matching. Unlocking a human-like information discovery process.
This article will explore the training process of the first sentence transformer, sentence-BERT — more commonly known as SBERT. We will explore the Natural Language Inference (NLI) training approach of softmax loss to fine-tune models for producing sentence embeddings.
Be aware that softmax loss is no longer the preferred approach to training sentence transformers and has been superseded by other methods such as MSE margin and multiple negatives ranking loss. But we’re covering this training method as an important milestone in the development of ever-improving sentence embeddings.
This article also covers two approaches to fine-tuning. The first shows how NLI training with softmax loss works. The second uses the excellent training utilities provided by the sentence-transformers
library — it’s more abstracted, making building good sentence transformer models much easier.