# Deep adversarial learning is finally ready đ and will radically change theÂ game

Adversarial learning is one of the most hyped areas in deep learning. If you browse arxiv-sanity, youâll notice much of the most popular recent research explores this area.

This post will:

- Explain why we should care about adversarial learning
- Briefly introduce generative adversarial networks (GANs) and the major challenges associated with them
- Summarize recent research (Wasserstein GAN, Improved Training of Wasserstein GANs) that solve these challenges and stabilizes GAN training (implementation included)

Here is a presentation I gave on the topic at the Re-Work 2017 Deep Learning in Healthcare summit:

**Improving Image Classification with Generative Adversarial Networks**

*Generative adversarial networks (GANs) are one of the most promising areas in deep learning research. In thisâŠ*videos.re-work.co

#### Classical đ» machine learning -> DeepÂ learning

In the opening lecture of a course I took at UIUC on analog signals and systems, the professor confidently asserted something along the lines of:

This is the most important course you will take, and abstraction is the most important concept in engineering.

The solution to complexity is abstraction, also known as information hiding. Abstraction is simply the removal of unnecessary detail. The idea is that to design a part of a complex system, you must identify what about that part others must know in order to design their parts, and what details you can hide. The part others must know is the abstraction.

- cs.cornell.edu

Deep neural networks learn hierarchical representations of data. The layers in a network and the representations they learn build on each other, with layers representing data at a progressively higher level of **abstraction**. Given raw data, a question to ask the network, and an objective function to evaluate the networkâs answer, a network learns to optimally represent (abstract) this data.

A đ consequence of this concept is that feature engineering is learned and performed by the network. As opposed to the classical machine learning approach where features that are expected to contain information relevant to the task-at-hand are manually identified and extracted from data, reducing the dimensionality of input to the âlearningâ algorithm.

When the underlying structure, patterns, and mechanisms of data are learned instead of hand-crafted âïž, previously infeasible applications of AI are enabled and super-human performance is made possible.

#### Deep learning -> Deep adversarial learning

Years ago I had a boxing coach who wouldnât let new boxers ask questions. New boxers asked the wrong questions, got answers they didnât need, and then focused on the wrong things.

Asking the right questions takes as much skill as giving the rightÂ answers.

- RobertÂ Half

The beauty of adversarial learning is that our networks learn **entirely** from dataâââthe questions to ask, the corresponding answers, and the evaluation of these answers are learned. As opposed to the classical deep learning approach where questions that are expected to be relevant to the task-at-hand are manually identified, and hand-crafted objective functions guide the optimization of our networks towards learning the corresponding answers.

Deep Mind recently demonstrated the amazing potential of deep (adversarial) learning with AlphaGo, showing that AlphaGo invents new knowledge and teaches new theories in the game of Go. This ushered in a new era of Go and moved players past a local maxima theyâd been stuck in for thousands of years. AlphaGo was able to achieve this by learning an evaluation function that describes âthe scoreâ at any given moment to the system, rather than attempting to hand-craft and pre-program this. AlphaGo was then trained against itself through millions of simulated games. Sound like adversarial learning đ€?

AlphaGo didnât just brute force đ its way towards becoming the best Go player in the world. It truly mastered the game and all its subtleties and intricacies. This was possible because it wasnât constrained by human input or our (what we now realize is limited) understanding of the problem domain (both in asking, answering, and evaluating questions). The next step will be to apply these approaches in learning to the real-world đ. Itâs hard to imagine how AI will reinvent agriculture đ±, healthcare đ„, etcâŠ but it will happen.

#### Generative adversarial networks

What I cannot build, I do not understand.

- RichardÂ Feynman

The above quote motivated me to start working with GANs. GANs pose the training process as a game between two networks and allow adversarial learning on generic data.

With the goal of modeling the true data distribution, the generator learns to generate realistic samples of data while the discriminator learns to determine if these samples are real or not. With the belief that the ultimate expression to understanding something is being able to recreate it, our goal seems like a worthy one. If we are able to successfully train our GAN to equilibrium (generated samples indistinguishable from real samples by a perfect discriminator), we should be able to apply this gained understanding of our data to almost any task with top performance đŻ.

#### The problem

GANs are difficult to optimize and training is unstable. The network architectures must be carefully designed and the balance between the generator and discriminator must be carefully maintained for training to converge. On top of that mode dropping is typical in GANs (generator learns only a very small subset of the true distribution), and they are difficult to debug due to near-meaningless learning curves.

Still, state-of-the-art results have been achieved with GANs, but practical usefulness has been limited by these problems.

#### The solution

GANs are trained to minimize the distance between the generated and true data distributions. Initially, the Jensen-Shannon divergence was used as this distance metric. However, Wasserstein GAN (wGAN) provided extensive theoretical work and showed empirically that minimizing a reasonable and efficient approximation of the Earth Moverâs (EM) distance is a theoretically sound optimization problem that cures the main problems of GANs (described in the section above). For this approximation of the EM distance to be valid, wGAN imposed weight clipping constraints on the critic (referred to as the discriminator pre-Wasserstein) which caused some training failures.

Improved training of Wasserstein GANs enables very stable GAN training by penalizing the norm of the gradient of the critic with respect to its input instead of clipping weights. This âgradient penaltyâ is simply added to the Wasserstein distance described above for the total loss.

Finally, for the first time, we can train a wide variety of GAN architectures with almost no hyper-parameter tuning, including 101-layer ResNets and language models over discrete data đȘ!

One of the đ benefits of using the Wasserstein distance is that as the critic improves, the generator will receive improved gradients from it. When using the Jensen-Shannon divergence, gradients vanish as the discriminator improves and the generator has nothing to learn from (a major source of training instability).

I recommend reading both papers if interested in gaining a solid theoretical understanding of these concepts:

The way I visualize GANs has changed with the introduction of this new objective function as illustrated below:

#### Implementation

#### Conclusion

Adversarial learning allows us to free our models of any constraints or limitations in our understanding of the problem domainâââthere is no preconception of what to learn and the model is free to explore đ” the data.

In the next post we will see how we can utilize the representations learned by our generator for image classification.

*Iâll be talking about GANs at the **Deep Learning in Healthcare Summit** in Boston on Friday, May 26th. Feel free to stop by and say đ!*