Understanding Large Language Models: A ChatGPT Curated Reading List

4 min readApr 15, 2023

The Premise

I don’t think I need to argue that Large Language Models (LLMs) like GPT-3/4, Bard, PaLM and LLaMA are currently the hottest topic in Artificial Intelligence. I’m also fairly certain that anyone who clicked into this article has some sense of how revolutionary these models have the potential to be across fields and industries.

At it’s most basic level, an LLM is a model capable of both receiving input and giving output in the form of human like language. You interact with it like you would a person, by talking to it, and it responds back the same way. They are called “Large” both because of how much data they are trained on, and the number of parameters involved. It sounds simple, but the applications are massive. Because they can read and generate human-like text, they can be used to do things like summarize information, answer questions, or produce content, for example marketing material.

This morning I used it to write a workout plan for me:

Me: I need a 1-hour bodybuilding workout for chest and tris.
ChatGPT: Sure, here’s a workout you can try:
Warm-Up: 10 minutes of light cardio to get your heart rate up.
Bench Press: 4 sets of 8–12 reps, using a weight that is challenging but allows you to maintain good form.
Incline Dumbell Press: 3 sets of…

To people sitting outside of the field, I’m sure it seems like these models appeared overnight. In early 2022, no one had heard of LLMs, and now suddenly there are hundreds if not thousands of new startups, apps, news articles, podcasts, and conspiracy theories surrounding them. The truth though is that LLMs are a natural outgrowth of years of steady research, and if you look back at the literature, you can see how each of the foundational building blocks was placed.

“So why not pull that thread?” I figured. “Let’s read through the relevant literature together and see if we can piece together where LLMs came from, and how they work.”

Then another, slightly more novel thought occurred: “I wonder what papers ChatGPT thinks are most important for me to read if I want to understand it?”

And the idea of the reading list was born.

The Goal

So we’re going to do it. I asked ChatGPT the following question:

“I am a Data Scientist with a Masters in the topic from University of California Berkeley. What papers should I read to understand how GPT3 works?”

In response I received a list of 5 papers. The goal of this project will be to read each of the five papers, synthesize the key points, and try to explain it in a way that most people with a general knowledge of Data Science or an interest in Artificial Intelligence will understand.

For those of us who specialize in the topic or have an especially deep interest I invite you to read each of the papers with me, but I hope I will be able to explain it well enough for all of us, who have a desire, to take part in the conversation. It should be fun!

The List

Without further ado, here is the list!

“Attention Is All You Need” by Vaswani et al. (2017): The foundational paper that introduced the Transformer architecture. This is what all Large Language Models are based on.
“Improving Language Understanding by Generative Pre-Training” by Radford et al. (2018): The paper that introduced the concept of Generative Pre-Training. GPT in fact stands for “Generative Pre-Trained Transformer”
“Language Models are Few-Shot Learners” by Brown et al. (2020): This is the paper provides a technical overview of GPT-3, its architecture, and that first demonstrated the incredible capabilities of the model.
“Scaling Laws for Neural Language Models” by Kaplan et al. (2020): Investigates how the performance and computational requirements change as we scale LLMs.
“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” by Bender and Gebru (2021): A paper discussing the ethical and societal concerns that arise from LLMs, including the potential effects of bias.

The Next Step

We’ll be diving right into the first paper (“Attention Is All You Need”) this week. Be sure to subscribe to me so you can be notified when each article comes out. Happy reading!

Articles Currently Out:

Attention Is All You Need Part I
Attention Is All You Need Part II
Improving Language Understanding by Generative Pre-Training
Language Models are Few Shot Learners

The Author

With a Bachelors in Statistics, and a Masters of Data Science from the University of California Berkeley, Malachy is an expert on topics ranging from significance testing, to building custom Deep Learning models in PyTorch, to how you can actually use Machine Learning in your day to day life or business.