Solving the Unity OTC: The Deep Learning Problem

Micheal Lanham
8 min readMar 17, 2019

--

This is another entry in the series of blog posts I have written recently about the Unity Obstacle Tower Challenge. You can read more about the challenge and my previous ramblings on the challenge provided in the links below:

In those previous blog posts you will learn about the Unity Obstacle Tower Challenge (OTC) and the challenges it poses I believe will bring us to closer to discovering General AI. You will also learn from my last post about the visual learning problem and the challenges that need to be overcome. For those of you not in the know of what the OTC is or my previous posts, just save yourself the time and refer to those posts first before continuing.

Deep Learning is the Problem?

In my previous blog posts I outlined some of the challenges the OTC now pushes AI developers and researchers to solve. I finished my last blog on visual learning problem by suggesting we could use GANs for generalizing the vision state we input into a deep reinforcement learning model. For this post I continue on that current trend and go further to suggest the tools we are using to solve the challenge, Deep Reinforcement Learning, is not up to the task. Now, keep in mind that this is coming from a guy that just wrote a 400 page book on Deep Learning and Deep Reinforcement Learning. So, by no means am I suggesting Deep Learning, Deep Reinforcement Learning or Reinforcement Learning are broken, far from it. Those technologies will likely dominate the Machine Learning landscape for many more years. What I am suggesting however, is that Deep Learning has serious limitations and those limitations are seriously challenged with the OTC.

At this point, if you follow Deep Learning closely you may have already heard the comments made by Yann LeCunn and others suggesting Deep Learning is dead and its needs a re branding. LeCunn, the Director of Facebook AI Research and the inventor of convolutional neural networks, has gone on to suggest that the new AI age should be focused on Differentiable Programming. If this is the first of hearing such things, you can catch up with an excellent blog post:

Deep Learning is Differentiation

LeCunn, as well as others, rightly points out that Deep Learning is really nothing more than solving non-linear systems of equations using differentiation. In fact, one of the first things you used to learn about neural networks aka Deep Learning, is that they are just fancy equation solvers, really nothing more. This fact is often lost on newcomers and perhaps becomes less obvious when working with abstractions like Keras or other DL tools. Another important thing often lost is the fact that deep learning solvers rely on calculus and are bound by it’s rules and limitations. Which means that in order to be solvable it is best that equations be continuous and without discontinuities. This also, as you may imagine works in the reverse, so when we try to calculate gradients for back propagation through a network it is best done with continuous functions. That doesn’t mean to say we cannot solve for discontinuous functions, we can, but we can’t do it easily or automatically. I have shown various forms of non-differential functions, discontinuous in the image below:

Why is this a Problem in the OTC?

So why is this a problem with the OTC in particular? Well, it all relates to expected action output. What you may not realize is that many of the Deep Reinforcement Learning demos that impressed you before likely used continuous actions. Demos that did use discrete actions likely used more model driven methods like DQN. Continuous action agents output actions in the form of raw values in a range, a common range is 0–1. These form of agents are often relegated to control based tasks and algorithms like proximal policy optimization or PPO. PPO is a reinforcement learning algorithm developed by Open AI and has been shown to work very well with continuous action agents. Model based methods like Deep Q Networks isolate state mappings by action thus reducing the need for extensive differentiation. Another trick you can try is to map a discrete action agent to a continuous action agent, this can overcome some issues but possibly could work.

However, the OTC forces the agent to explicitly be a discrete agent, meaning the agent must choose between a specific set of actions (forward, back, left, right, rotate left/right and jump) or a possible combination of those actions. This forces the agent to learn discontinuous actions based on state and you guessed it, also introduces discontinuities in the action output space. This in turn causes the deep part in Deep Reinforcement Learning to break down and you will often see this in the form of agents spinning in place or just sitting there. In fact, policy based methods like PPO, ones that rely heavily on differentiation to map states based on policy rather than a model, break down even further when trying to train discrete action learners.

Avoid Policy Based RL?

Unity is currently pushing challengers to use the Deep Reinforcement Learning framework from Google called Rainbow. Rainbow is based on the more stable DQN reinforcement learning algorithm mixed with several valuable optimizations. A blog from Google introducing Rainbow is provided for some context:

Now to be honest I have not spent a lot of time with Rainbow and can’t really discuss the merits either way. Except to say, that the Unity team is currently looking at other methods and so should you if you want to truly challenge the OTC. The one thing that I still worry about with Rainbow is it’s dependency on Deep Learning, which I strongly feel is not well suited to this particular problem. So what is?

Evolutionary Computation

I was first introduced to evolutionary computation and genetic algorithms around the same time I first learned about neural networks, some 20 years ago. Around that time, both technologies looked poised to be the next big thing, with Deep Learning (neural networks) winning out about only 10 years later. In fact, most of my commercial successes in Machine Learning applications have been using genetic expression programming and evolutionary algorithms to build static and dynamic solvers. These evolutionary solvers, I was able build, were able solve non-linear and discontinuous functions. Something not easily done with other methods and all the while able to solve complex loss functions and find solutions to these equations with only simple statistical methods and evolutionary techniques. They also can quickly solve non-linear and discontinuous systems of equations without suffering from the same limitations as Deep Learning or methods using heavy differentiation.

Without getting in too much depth, evolutionary computation is the combination and theory of genetic evolution combined with mathematics and computer programming. An excellent blog describing the background of this fun branch of Machine Learning is shown below:

If you have a background in data science and want a close look at some cool possibilities with gene expression programming a side area of evolutionary computation, check out this excellent software tool, called GepSoft. GepSoft, link below, combines gene expression programming with data analysis and data science as a way of performing classification or prediction.Gene expression programming is a method by which you define mathematical or programmatic functions as genes. A gene in GEP is synonymous with a gene in DNA and a collection of genes could describe a function, equation or even a program (yep with if and for loops). You then use similar concepts of evolution and natural selection to breed these DNA programs together in order to find better DNA and so on. We still define loss or value to a string of DNA but that loss is propagated back through by natural selection and not mathematics. Meaning algorithms are chosen by their performance and bred with algorithms based on principals of natural selection and genetic breeding, with the hope of making stronger babies.

Evolutionary computation is a huge area and the above just barely scratch the surface. There are many applications of this form of technology some of which have already been applied to the areas of deep learning and the perceptron. Of course, getting back to the OTC our primary interest is finding a better solver and thus replacing the deep in Deep Reinforcement Learning. After all, Deep Learning is just a solver and one that we could potentially replace with anything and why not evolutionary algorithms.

Evolutionary Reinforcement Learning

Evolutionary computation is not the black box that deep learning often gets accused of. While the process of building genetic output may not be intuitive, the output in most cases is. Evolutionary algorithms often will and can output actual equations, functions or blocks of code that solve real world problems. In fact, it is possible that EC could rewrite the way we think about reinforcement learning and how we solve it. Of course, evolutionary computation (EC)is not without it’s problems and flaws as most complex things are. EC can be more time consuming and certainly just as frustrating to train primarily because of its use of various arbitrary natural selection techniques. Another big problem is stalling during training. This can often happen when attempting to breed more complex algorithms. Even with these problems I can’t help but feel we have only scratched the surface of this area of ML. It appears to me that we need to spend far more time exploring the many possibilities evolutionary computation warrants.

It remains to be seen how the winners of the OTC complete the challenge. Will Rainbow or another DRL framework prove up to the challenge or will we need to rethink our entire concept of learning and the basics of solving learning problems? One thing we must remember for ourselves is that learning is about solving systems of equations. We just need to find the best system that works for our needs and may just unveil the secrets to General AI.

--

--

Micheal Lanham

Micheal Lanham is a proven software and tech innovator with 20 years of experience developing games, graphics and machine learning AI apps.