The Strange Loop in Deep Learning

Credit: Escher

Douglas Hofstadter in his book “I am a Strange Loop” coined this idea:

In the end, we are self-perceiving, self-inventing, locked-in mirages that are little miracles of self-reference.
— Douglas Hofstadter

Where he describes this self-referential mechanism as what describes the unique property of minds. The strange loop is a cyclic system that traverses several layers in a hierarchy. By moving through this cycle one finds oneself where one originally started.

Coincidentally enough, this ‘strange loop’ is in fact is the fundamental reason for what Yann LeCun describes as “the coolest idea in machine learning in the last twenty years.”

Loops are not typical in Deep Learning systems. These systems have conventionally been composed of acyclic graphs of computation layers. However, as we are all now beginning to discover, the employment of ‘feedback loops’ are creating one of the most mind-boggling new capabilities for automation. This is not hyperbole, this is happening today where researchers are training ‘narrow’ intelligence systems to create very capable specialist automation that surpass human capabilities.

My first recollection of an effective Deep Learning system that used feedback loops where in “Ladder Networks”. Ladder Networks were introduced a very long time ago, way back in July 2015 (see: )! Here is a depiction of the architecture: Deconstructing the Ladder Network Architecture

The Ladder Network is a single loop up and down the layers followed a final single forward pass. The system gathers information from parts in the loop. At the time it was introduced, it had exhibited remarkable convergence numbers. This was further extended by the original researchers in a paper in mid 2016: Tagger: Deep Unsupervised Perceptual Grouping

Where you have multiple ladder networks strung together to form a network that’s able to group objects in images.

The Generative Adversarial Network (GAN) also has its own loop, but not explicit in its architecture, but rather as part of its training. GANs involve a training process with cooperative and dueling networks. This involves a generative network and a discriminative network. The discriminator network attempts to perform a classification against data that the generative network is creating. The generative network attempts to find data that tries to fool the discriminative network and as a final consequence a more robust discriminator and generator is formed. GANS perform a kind of Turing test and are currently the best generative model for images.

There is basically a feedback mechanism that is used in the form of a neural network (the discriminator) that a generator takes advantage of to create a more sophisticated results (i.e. more realistic images ). There are many examples of GANs generating realistic images. However, newer architectures are now leveraging GANs with Ladder Networks: Stacked Generative Adversarial Networks

These systems that leverage cycles also relates to newer research on ‘incremental learning’. One of the drawbacks of Deep Learning systems is that of the problem that ‘fine-tuning’ the network by training against new data can destroy previous remembered capabilities. This is the problem of the network ‘forgetting’ its past learning. In an architecture developed by Stanford called “Feedback Networks”, the researchers explored a different kind of network that feeds back into itself and develops the internal representation incrementally:

In an even more recently published research (March 2017) from UC Berkeley have created astonishingly capable image to image translations using GANs and a novel kind of regularization. They call this system CycleGAN, and it has some very impressive results:


CycleGAN is able perform remarkable image translations. As shown above, it take paintings as input and generate realistic photographs. It can perform what appears to semantic translation such as converting horses into zebras or converting images taken in one season and making it appear to be taken in another season.

The crux of the approach is the use of a ‘cycle-consistency loss’. This loss ensures that the network can perform the forward translation and followed by the reverse translations with minimal loss. That is, the network must learn how to not only translate the original image, in needs to also learn the inverse (or reverse) translation.

The major difficulty of training Deep Learning systems has been the lack of labeled data. Labeled data is the fuel that drives the accuracy of Deep Learning model. However, these newer kinds of system that begin to exploit loops are solving this lack of supervision problem. It is like having a perpetual motion machine where in these automation dream up new variations of labeled data. As a consequence, paradoxically fueling themselves with more data. These automation playing simulations game with themselves, and with enough game play, becoming experts at it.

It is analogous as to how AlphaGo was able to develop new Go strategies by doing self-play against itself. When automation are embedded with a feedback loop and are able to simulate (some would call this ‘imagination’) many different scenarios and self test those scenarios for correctness, then we are at the cusp of some extremely potent technology that can rapidly cascade into capabilities that few in our civilization will be prepared for. So the next time you see some mind boggling Deep Learning results, seek to find the strange loops that are embedded in the method.

Best of luck for those who are unprepared.

Become prepared, read the book: The Deep Learning Playbook for Enterprises

♡ Please heart if you like this!