The Deep Learning Many Body Problem

Credit: https://www.nasa.gov

The “Many Body Problem” (aka N-Body Problem) is a problem that appears simple enough but in fact highlights the difficulty of present day mathematics. The many body problem is where you have multiple interacting entities. In Physics, a three-body problem does not have a ‘closed-form’ or analytic solution (see: https://en.wikipedia.org/wiki/Three-body_problem). Something as simple as this reflects the limits of our analytic tools. This does not mean it is not solvable, it only means that we have to resort to approximation and numerical techniques perform the calculation. The three-body problem of the sun, the moon and the earth can be calculated numerically with sufficient precision to allow a man to land on the moon.

In Deep Learning, there is an emerging N-body problem. Many of the more advanced systems are now tackling the multi-agent problem. Each agent will likely have goals (i.e. objective function) that may be cooperative or competitive with the global goals. In multi-agent deep learning system or even in modular deep learning systems, researchers need to devise scalable methods for coordinated work.

Recent papers from Johannes Kepler University, DeepMind, OpenAI and Facebook have explored diverse aspects of this problem.

A team in Johannes Kepler University, that includes Sepp Hochreiter (inventor of LSTM) has proposed using an analog of the Coulomb force (i.e. Electromagnetic force proportional to inverse distance squared) as an alternative objective function to train Generative Adversarial Networks (GAN).

Source: https://arxiv.org/pdf/1708.08819.pdf

Achieving an equilibrium state between two adversarial networks is a hot research problem. It is hard enough to solve the two-body problem in DL. The research argues that the use of this approach prevents the undesirable condition of a “mode-collapse”. Furthermore, the setup ensures converges to a optimal solution and that there is only one local minima that happens to be also global. This perhaps may be a better solution that the Wasserstein objective function (aka Earth Mover Distance), that was all the rage just a few months ago. The team has labeled their creation “Coulomb GAN”.

Microsoft’s Maluuba has published a paper that describes a multi-agent system that is able to play Ms. Pacman better than humans. The Ms. Pacman game is like the original Pacman game where the objective is to accumulate as many pellets and fruits while avoiding ghosts. The paper is titled “Hybrid Reward Architecture for Reinforcement Learning.” The paper describes an Reinforcement Learning (RL) implementation (i.e. HRA) that differs from the typical RL architecture:

Source: https://arxiv.org/pdf/1706.04208.pdf

What is surprising about this paper is the number of objective functions used. The paper describes the use of 1,800 value functions as part of its solution, that is, the use of agents for each pellet, each fruit and each ghost. Microsoft research has shown the validity of using thousands of tiny agents to breakdown a problem into sub-problems, and to actually solve it! The coupling between the agents are clearly implicit in this model.

DeepMind tackles the problem of multi-agents having shared memory. In a paper titled “Distral: Robust Multitask Reinforcement Learning” the researches have acknowledged the problem with a “mind-meld” inspired method of agents coordination to solve a common problem. Recognizing this, the researchers pursued an approach that encapsulates each agent. However they allow some information to trickle through an agent’s encapsulation boundary, in the hopes that a narrow channel will be more scalable and robust.

We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a “distilled” policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies.
Source: https://arxiv.org/pdf/1707.04175.pdf

The results lead to faster and more robust learning and thus validating the approach of having narrow channels. The open question in these multi-agent (N-body problems) is the nature of this coupling. This DeepMind paper shows the effectiveness of a much lower coupling between agents versus the more naive approach of tight coupling (i.e. weight sharing).

OpenAI recently published an intriguing paper about a multi-agent system that is trained to model other agents within its system. The paper is titled “Learning with Opponent-Learning Awareness”. The paper shows that the ‘tit-for-tat’ strategy emerges as a consequence of endowing social awareness capabilities to multiple agents. Although the results have scalability issues, it indeed is a very fascinating approach since it tackles one of the key dimensions of intelligence (see: Multiple Dimensions of Intelligence).

In summary, many of the leading Deep Learning research organizations are actively exploring modular deep learning. These groups are exploring multi-agents that are composed of distinct object functions, all collaborating towards solving a single global objective function. There are still many issues that need to be resolved, but clearly, this approach is indeed an extremely promising path toward greater progress. Last year I observed the movement towards Game Theory as a guiding principle for future progress. However, this year, we are now seeing much richer explorations that explore the loose coupling of multi-agent systems. I discuss these ideas of loose coupling in this article and in the book:

More coverage here: https://gumroad.com/products/WRbUs