How Open Ai’s Andrej Karpathy Made One of the Best Tutorials in Deep Learning

Usama Ahmed
5 min readJun 5, 2023

--

YouTube Videos of Neural Networks: Zero to Hero

Andrej Karpathy is a leading expert in deep learning and computer vision. He is currently at OpenAI, where he was a founding member and a research scientist. He also worked as the Sr. Director of AI at Tesla, where he led the development of Tesla Autopilot and Full Self-Driving. He has a PhD from Stanford University, where he taught the popular CS231n course on Convolutional Neural Networks for Visual Recognition.

Neural Networks: Zero to Hero

Andrej began creating YouTube videos in August 2022 to teach some of his educational projects such as micrograd, makemore and nanoGPT. He put them together in a playlist of 7 videos: Neural Networks: Zero to Hero.

In this article, we will go deep into this playlist, and why I consider it one of the best tutorials ever in Deep Learning

Why it’s the best?

1- Intuition (is not) all you need

Unlike many other tutorials, Andrej didn’t just focus on the general idea and the intuition behind any topic he explained. He also showed the details, whether they are mathematical or implementation details, that let you understand what really goes on under the hood.

Intuition is very important for all Deep Learning topics, but many tutorials only give a brief overview of the topic or just try to scratch the surface with some fancy visualizations (visualization are not always bad, they’re usually good!) , but it doesn’t help you overcome your fear of diving into the details.

2- Combining Theoretical and Practical Parts

This is by far the one advantage I hardly see in any other tutorial or course, even the top-tier courses like Andrew Ng’s Machine Learning Specialization. The practical part consists of a series of notebooks containing interactive visualizations, and complete the code assignments.

But in this playlist, Andrej is delivering a full live coding experience. You can even see mistakes and errors, and how he fixes them. You could say that the main theme of most of the videos is coding. But coding alone is not enough.

A deep understanding of the theoretical background behind many topics is essential. You often see him open a paper and explain an idea from it.

Bengio et al. 2003 MLP language model, which Andrej explained in makemore videos

3- Following the API

Throughout the three projects built on the playlist, Andrej follows Pytorch naming conventions and logic, so you can get a detailed view of how it’s really implemented

You will even end some parts, implementing PyTorch classes yourself and use them in the projects (you will be asked to replace them with PyTorch classes later 😃)

Pytorch-like classes in makemore videos

4- Exercises

Each video has a set of exercises and challenges at the end and in the description. Some of them may seem very straightforward, such as (beat my loss) or (read … paper and implement an idea from it), but they are all really useful and challenging. Some of them also give a hint about what’s in the next video, so I highly recommend trying them before moving on to the next video.

Content of the Playlist

1- Micrograd

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API _Micrograd Repo

Micrograd is a really good start, before diving into building anything, you get a good idea about (how all of this actually works?)

Micrograd Training loop (left) Vs Pytorch Training loop (right)

2- Makemore

Makemore is the core of the series (5 videos out of 7), but it’s not just the small language model that generates new names.

Throughout Makemore videos, you will start from the very basics of MLPs to writing the whole backpropagation code yourself, the idea Andrej himself wrote a whole article to explain why it’s really important

Motivation to be a backprop ninja (video 4)

3- nanoGPT

This one may be the real motivation to watch the whole playlist, because you build the Transformer architecture from scratch and apply it to real data.

The goal of this project is to get a real sense of how LLMs are built, and how training huge models on huge amounts of data really makes a difference!

Tips & My version

In addition to attempting the exercises, I think it’s important to practice the code yourself, not just passively watch the videos without engaging, and try to experiment, make some changes and try different options.

In this Repository, I have created my own version of the notebooks. The code is almost the same, but I have organized it in a way that makes it easy to use as a reference for certain points in the video. You may find the notebooks a bit long, but they are meant to be like that

Hierarchal Structure of the my notebooks

Beside organizing the notebooks, I added my solutions of the exercises, you may find some mistakes of course! please correct me when you do

Table of Contents of my GitHub Repo

Edit: An amazing response by Smarten told me about this amazing Repository it contains many useful visualizations and explaination.

Conclusion

Having the opportunity to learn from world-class Instructor and researcher like Andrej is something you mustn’t ignore, and beside the new things you learn, I guarantee you will have fun watching the videos, writing the code and reading the papers.

--

--

Usama Ahmed

Computer Engineering student, interested in AI, Science ,and Education