You Don’t Understand Neural Networks Until You Understand the Universal Approximation Theorem

The Proof Behind the Neural Network’s Power

Published in

Analytics Vidhya

6 min readJul 1, 2020

If you’ve come from the Twitter via Yann Lecun and/or Steven Pinker, I have tried to update this piece accordingly and may write another article seeking to address and discuss the disagreement here. For now, remember that this is intended to be general communication concept, not a rigorous deep learning paper. If you haven’t seen the conversation already, I’d encourage you to explore it — there’s lots of interesting thoughts being shared. We’re all sharing ideas here on the forum of the Internet in an effort to get closer towards some aspect of the truth.

TL;DR: read this article knowing that there is disagreement about it.

The Universal Approximation Theorem is, very literally, the theoretical foundation of why neural networks work. Put simply, it states that a neural network with one hidden layer containing a sufficient but finite number of neurons can approximate any continuous function to a reasonable accuracy, under certain conditions for activation functions (namely, that they must be sigmoid-like).

Formulated in 1989 by George Cybenko only for sigmoid activations and proven by Kurt Hornik in 1991 to apply to all activation functions…

You Don’t Understand Neural Networks Until You Understand the Universal Approximation Theorem

The Proof Behind the Neural Network’s Power

Written by Andre Ye