Sitemap
Data Science Collective

Advice, insights, and ideas from the Medium data science community

Inside GPT-OSS: OpenAI’s Latest LLM Architecture

What OpenAI’s open-weight model reveals about the design of modern large language models

7 min readSep 1, 2025

--

Press enter or click to view image in full size
MMLU benchmark scores for GPT-OSS-120B, GPT-OSS-20B, OpenAI o3, and OpenAI o4-mini
Massive Multitask Language Understanding (MMLU) benchmark results. Image by the author.

It has been several years since OpenAI shared information about how its LLMs work. Despite the word “open” in its name, OpenAI has not yet disclosed the inner workings of GPT-4 and GPT-5.

However, with the release of the open-weight GPT-OSS models, we finally have new information about OpenAI’s LLM design process.

To learn about the latest state-of-the-art LLM architecture, I recommend studying OpenAI’s GPT-OSS.

Overall LLM Model Architecture

To fully understand the LLM architecture, we must review OpenAI’s papers on GPT-1 [1], GPT-2 [2], GPT-3 [3], and GPT-OSS [4].

The overall LLM model architecture can be summarized as follows:

Press enter or click to view image in full size
The Transformer architecture consists of the tokenizer, embedding matrix, multiple Transformer blocks, unembedding matrix, and softmax.
The basic GPT-OSS Transformer architecture. Image adapted from [5]

Based on this figure, we start at the bottom with an example input sentence, “The quick brown fox”. The goal of the LLM is to predict the next token, which could be “jumps”.

Tokenization

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Dr. Leon Eversberg
Dr. Leon Eversberg

Written by Dr. Leon Eversberg

Deep Learning PhD | AI Software Engineer | Book Author 📖

Responses (4)