List: LLM Transformer Architecture | Curated by Frank Coyle, PhD

Apr 29, 2024

8 stories

2 saves

LLM Transformer Architecture

Confused about how transformers convert natural language into the numbers that encode syntax and semantics? Michael Phi's LLM Transformer Architecture article takes you step by step through the process of constructing the matrices used to make this amazing technology work. After computing the dot products of vectors, a Softmax function is applied to the numeric results, yielding values between 0 and 1. These values are then seen as probabilities that drive the modification of weights in the underlying neural network that is trying to predict the best output given user queries. If you like his illustrations and animations, check out his Illustrated Guide to Neural Networks.

Michael Phi
in
Towards Data Science

Illustrated Guide to Transformers- Step by Step Explanation

Transformers are taking the natural language processing world by storm. These incredible models are breaking multiple NLP records and…

Apr 30, 2020

Illustrated Guide to Transformers- Step by Step Explanation

Apr 30, 2020

Eivind Kjosbakken
in
Towards Data Science

How To Improve AI Performance By Understanding Embedding Quality

Creating quality embeddings is an essential part of most AI systems, and so this article walks you through how to ensure their quality.

Feb 14

How To Improve AI Performance By Understanding Embedding Quality

Feb 14

Shivika K Bisen
in
Bright AI

Large Language Models (LLM): Difference between GPT-3 & BERT

Dec 23, 2022

Large Language Models (LLM): Difference between GPT-3 & BERT

Dec 23, 2022

Mariya Mansurova
in
Towards Data Science

Text Embeddings: Comprehensive Guide

Evolution, visualisation, and applications of text embeddings

Feb 13

Feb 13

Positional rotary encoding (RoPE) is considered an improvement over traditional positional encodings for large language models (LLMs) and other transformer-based models. This contrasts with traditional positional encodings (such as those used in the original Transformer model), which add discrete positional vectors to input embeddings at the beginning of the model. RoPE's approach allows for a more nuanced representation of the relative positions of tokens. This is because the rotary encoding method essentially "rotates" the embedding space in a way that keeps the relative distances and orientations consistent, which helps in maintaining the contextual relationships between words or tokens in sequences. However, it's important to note that the effectiveness of RoPE versus traditional positional encodings can depend on the specific context, task, and data. See Ngieng Kianyew's article (below) for the gory mathematical details.