Tushar MadaanReproducing GPT-2 (124M): Key Insights and TechniquesIn this blog, we’ll explore the key details of reproducing the GPT-2 (124M) model, focusing on its architecture, training process, and…4d ago14d ago1
Tushar MadaanNavigating Model Drift: A Systems ApproachImagine a small town of Univille, with only one university where exactly 100 students apply each year. Each student is tested on 2 exams…May 30May 30
Tushar MadaanBayesian thinking , reward systems and the computational inefficiency of skepticism.Even if we start with small biases(unbalanced priors), our priors creep into our sampling of the world and thus have likelihood to drive…Nov 4, 2015Nov 4, 2015