Sitemap
about ai

Diverse topics related to artificial intelligence and machine learning, from new research to novel approaches and techniques.

Understanding Embedding Models in the Context of Large Language Models

--

Large Language Models (LLMs) like GPT, BERT, and similar architectures have revolutionized the field of natural language processing (NLP). A critical concept that underpins these models is embeddings. In this tutorial, we’ll break down what embedding models are, why they’re essential in NLP, and provide a simple hands-on example using Python that can be run in Google Colab.

Zoom image will be displayed
Tokenizers convert raw text into tokens and IDs, while embedding models map those IDs into dense vector representations. Together, they power the semantic understanding of LLMs. Image credit: https://tzamtzis.gr/2024/coding/tokenization-by-andrej-karpathy/

What Are Embedding Models?

In the context of LLMs, an embedding model is a neural network designed to represent text (e.g., words, phrases, sentences) as dense vectors in a continuous vector space. These vector representations capture semantic relationships between text items, making them the backbone of modern NLP systems.

For instance:

  • Words like “king” and “queen” might have embeddings close to each other in this vector space.
  • Words like “king” and “man” might have a vector relationship that matches “queen” and “woman.”

After imagining how these semantic relationships look in this vector space we might think that words go directly into these vectors that preserve semantic relationships. This idea might cause some confusion when we talk about tokens in the LLM processing pipeline. Lets clarify things a bit…

--

--

about ai
about ai

Published in about ai

Diverse topics related to artificial intelligence and machine learning, from new research to novel approaches and techniques.

Edgar Bermudez
Edgar Bermudez

Written by Edgar Bermudez

PhD in Computer Science and AI. I write about neuroscience, AI, and Computer Science in general. Enjoying the here and now.

No responses yet