TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Automatic Differentiation (AutoDiff): A Brief Intro with Examples

An introduction to the mechanics of AutoDiff, exploring its mathematical principles, implementation strategies, and applications in currently most-used frameworks

Ebrahim Pichka
TDS Archive
Published in
10 min readOct 11, 2024

--

Photo by Bozhin Karaivanov on Unsplash

The Fundamental Role of Differentiation in Modern Machine Learning Optimization

At the heart of machine learning lies the optimization of loss/objective functions. This optimization process heavily relies on computing gradients of these functions with respect to model parameters. As Baydin et al. (2018) elucidate in their comprehensive survey [1], these gradients guide the iterative updates in optimization algorithms such as stochastic gradient descent (SGD):

θₜ₊₁ = θₜ - α ∇θ L(θₜ)

Where:

  • θₜ represents the model parameters at step t
  • α is the learning rate
  • ∇_θ L(θₜ) denotes the gradient of the loss function L with respect to the parameters θ

This simple update rule belies the complexity of computing gradients in deep neural networks with millions or even billions of parameters.

2. The Differentiation Triad

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Ebrahim Pichka
Ebrahim Pichka

Written by Ebrahim Pichka

Engineering Graduate Student & Research Assistant, interested in ML, and Optimization. https://epichka.com/

No responses yet