Demystifying deep learning
It’s not just a black box—Joan Bruna & his team explore how deep learning works by focusing on the mathematics
A reason why recognition systems in fields like natural language processing and computer vision have improved so dramatically is due to the increasing application of deep learning networks. The approach has been such a success that deep learning has not only become a buzzword in the tech community but also in mainstream news.
But as we race towards building bigger, better, and faster machines, it’s important to think about why deep learning is so successful. What are deep learning’s properties? What makes one deep architecture better than another? And, based on the observations that we’ve collected during deep learning’s young lifespan (after all, the powerful tool only emerged in the late-1980s thanks to the likes of Geoff Hinton and Yann LeCun), can we begin to form a rigorous theory about how and why the approach works?
To get to the heart of the matter, we have to return to the powerful language that underpins all aspects of science and technology: mathematics. Enter Joan Bruna’s most recent co-authored working paper with Rene Vidal (Johns Hopkins), Raja Giryes (Tel-Aviv), and Stefano Soatto (UCLA), which aims to provide some mathematical explanations for some of deep learning’s properties.
Though scientists have made empirical observations about, say, the relationship between the size of a neural network and its accuracy rate during neural network training, the researchers explain that “there is currently no rigorous theory that provides a precise mathematical explanation for these experimentally observed phenomena.”
To begin addressing this gap, the researchers collect, consolidate, and explore the mathematics that underpins deep learning’s success — from aspects like global optimality to geometric stability. Learn more about the work here.
by Cherrie Kwok