Best Embedding Model 🌟 — OpenAI / Cohere / Google / E5 / BGE

An In-depth Comparison of Multilingual Embedding Models

Lars Wiik
13 min readApr 7, 2024

We are in the midst of a turning point in the AI landscape, with companies increasingly moving away from solely relying on in-house models and instead turning towards APIs from leading global vendors.

Companies such as OpenAI, Google, Cohere, and Anthropic currently dominate this new global LLM market, aimed at solving Natural Language Processing (NLP) tasks worldwide.

Alongside this, groundbreaking Text Embedding APIs, pivotal for various applications, have emerged; and we are seeing a battle among giants to offer the best multilingual embedding service.

Microsoft has taken a unique approach by open-sourcing a multilingual embedding model called E5, thereby adding depth to the competitive environment. And the Beijing Academy of Artificial Intelligence has recently unveiled a new competitive open-source multilingual model known as BGE-M3.

As an experienced Machine Learning Engineer specializing in product development for multilingual use, I find these advancements particularly interesting and have decided to compare the state-of-the-art in this field.

--

--

Lars Wiik

MSc in AI — LLM Engineer ⭐ — Curious Thinker and Constant Learner