The inability of Deep Learning to perform compositional learning is one of the main reasons for Deep Learning’s most critical limitations, including the need to feed them tons of data.

A recent analysis by OpenAI showed that the amount of compute used in the largest AI training runs has had a doubling period of 3.5 months since 2012 (net increase of 300,000x) (source)

§1: Introduction

Compositionality is the algebraic capacity to understand and produce novel combinations from known components (Loula 2018). While the human brain can easily learn compositionally, Neural Networks (NNs) are not able to discover and store skills that are common across problems, and to re-combine them in a hierarchical fashion to solve new challenges (Liška 2018).

The inability of NNs to perform compositional learning is one of the reasons for NNs most…


Explainable AI (xAI) is the new cool kid on the block and the xAI approach (build a black box and then explain it) is now the most cherished modus-operandi of Machine Learning practitioners. Is this really the best route? Why don’t we build an interpretable model right away?

Rashomon (羅生門 Rashōmon) is a 1950 Jidaigeki film directed by Akira Kurosawa. The film is known for a plot device that involves various characters providing subjective, alternative, self-serving, and contradictory versions of the same incident. (wikipedia)

Explainable vs Interpretable AI

Explainability and interpretability are two different concepts although, across different sources, the two seem to be erroneously used interchangeably. In this blog post, I will base my reasoning on the following definitions [7], which, at least from my viewpoint, seem to be the most widely adopted:

  • Explainable ML: using a black box…

How will your Deep Learning system perform on new data (generalize)? How bad can its performance get? Estimating the ability of an algorithm to generalize is necessary to build trust and be able to rely on AI systems.

Can you trust your AI? Will your AI go binge drinking or try to destroy the world once it goes live? (image source: https://unsplash.com/photos/0E_vhMVqL9g)

TL;DR — Traditional approaches (VC Dimension, Rademacher complexity) fail at providing reliable, useful (tight enough) generalization bounds. What if network compression goes hand in hand with the estimation of generalization bounds? That’s a winning lottery ticket!

Why does Statistical Learning Theory matter?

Ensuring that an algorithm will perform as expected once it goes live is necessary: the AI system needs to be safe and reliable. …


Are we born with some form of innate knowledge? Innatism is gaining neuroscientific evidence and may shape the next R&D steps in AI and Deep Learning.


Manifold learning

Under the manifold assumption, real-world high-dimensional data concentrates close to a non-linear low-dimensional manifold [2]. In other words, data lies approximately on a manifold of much lower dimension than the input space, a manifold that can be retrieved/learned [8]

The manifold assumption is crucial in order to deal with the curse of dimensionality: many machine learning models problems seem hopeless if we expect the machine learning algorithm to learn functions with interesting variations across an highly dimensional space [6]

Fortunately, it has been empirically proven that ANNs capture the geometric regularities of commonplace data thanks to their hierarchical, layered structure…


Vincent van Gogh, The Starry Night, 1889, MoMA The Museum of Modern Art

Despite wide adoption in the industry, our understanding of deep learning is still lagging.

[20], nicely summarized by [21], identifies four research branches:

  • Non-Convex Optimization: we deal with a non-convex function, yet SGD works. Why does SGD even converge?
  • Over-parameterization and Generalization: how can Deep Neural Networks avoid the curse of dimensionality?

Theorists have long assumed networks with hundreds of thousands of neurons and orders of magnitude more individually weighted connections between them should suffer from a fundamental problem: over-parameterization [19]

  • Role of Depth: How does depth help a neural network to converge? What is the link between depth and…

An engraving of the Turk from Karl Gottlieb von Windisch’s 1784 book Inanimate Reason (wiki)

In my previous post, while discussing the importance of DSLs in ML and AI, we mentioned the idea of Software 2.0, introduced by Andrej Karpathy:

Software 2.0 is written in neural network weights. No human is involved in writing this code because there are a lot of weights (typical networks might have millions), and coding directly in weights is kind of hard (I tried). Instead, we specify some constraints on the behavior of a desirable program (e.g., a dataset of input output pairs of examples) and use the computational resources at our disposal to search the program space for a…


Domain-Specific Languages make our life easier while developing AI/ML applications in many different ways. Choosing the right DSL for the job might matter more than the choice of the host language.

1 ) Business Logic

DSLs are a powerful tool to express concisely business logic.

Snippet source, read more (Gosh2010, Frankau2009)

At the same time, ML and AI systems do not come set in stone.

  • The underlying models reflect business and working hypothesis that might change over time
  • Sensitivity analysis should not be only performed against model (hyper)parameters but also against business and working assumptions

DSLs come in handy to fluently express complex business…

Mattia Ferrini

Principal Heisenberg Compensator — https://www.linkedin.com/in/mattia-ferrini/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store