Deep Reinforcement Learning DQN for Multi-Agent Environment

Published in

Yellowme

5 min readMay 16, 2020

AI Bots

Even if you are not in the tech world, you probably have heard that Artificial Intelligence is doing more and more cool things, and many important people are talking about it. Even Elon Musk that a few years was against AI, now, he is deep into it.

You probably have heard about chatbots, deep fake news, image recognition, self-driving cars, and many other areas where Artificial Intelligence is being used, how do this bots learn?

Well, deep learning and reinforcement learning can make it happen, but what happens when we have more than one bot taking decisions? Who is going to take the last call?

Deep learning

Do you know why it is called deep? Deep refers to the use of multiple layers of neurons connected to each other in a series of cascades that allow a neural network to produce an output. So deep learning means that we use many layers between the input and the output.

Each layer transforms the data of the input and affects in different ways how the data is transformed, according to their shape and connections to the next layers.

Example the Deep Learning — Graphic of the layers

We normally have one input layer of neurons, many hidden layers and finally one output layer. Therefore to refer to a specific Neural Network we call how many hidden layers does it have.

There are many configurations for the hidden layers, depending on the connections between them and number of neurons that each layer has.

Reinforcement Learning

Broadly, the reinforcement learning is based on the assignment of rewards and punishments for the agent based in the choose of his actions. A common example will be like educating of a dog. If the dog does a good action, its rewarded with a bone. If it does a bad action he is punished with a mistreat. After some time of education, or training, the dog will learn, to stop doing bad actions and to start doing only good ones.

In machine learning is the same approach, an agent is active in an environment, the environment has a state and present a collection of actions for the agent to choose. In the picture, we can see a recursive diagram of the reinforcement learning. The agent selects an action and then the environment sends a reward or punishment according to the selected action for that state, and the agent learn if the action was good or bad. Then after the action is performed in the environment we have a new state that presents a new set of actions for that state, and we repeat the process until the agent learns which actions to choose. In any case, reinforcement learning is concerned with maximizing the cumulative reward or minimizing the punishment.

Deep reinforcement learning

Now, obviously deep reinforcement learning is the combination of both of them. So by combining these two methods we obtain a much better algorithm that has been tested in many areas, and has performed very well. Even better than humans in many cases, specially on the Atari games where Deep Mind tested the algorithm.

Multi-agent environment

The environment can be any set of a game or part of real life that we define, and the state are some characteristics of the environment that we can pass to our agent. For example, we can define the environment as the streets of a city, and the state, as the traffic that there is at any given point of time. So now we have an agent, a car in this case, that is going to navigate the environment and make decisions (turns), based on the state (the traffic), to select the fastest route to the destination.

Representation where the green cars have to arrive to the yellow destination, making decisions over using deep reinforcement learning.

Now in this case we have many agents/cars that are going to be taking decisions based on the decision of the other agents as well. So here is where it can get complicated. The goal for one car is clear, to arrive faster to the destination.

But, should we keep this goal when we have many agents making decisions in the same environment?

If we keep the same goal, each car will still be trying to find the way by its own, but it may take the route that limits the other cars and they will increase their own travel time.

Instead, we should make the goal to reduce the average travel time of the cars, so the optimization works for every car in the environment. Now each car will make decisions based not only in themselves but also regarding other cars, in order to avoid a much bigger traffic jam.

The multi-agent case also raises the concern of how much time does a car can sacrifice to avoid disturbing others cars travel time.

Final thoughts

Deep reinforcement learning is a very powerful tool, and in the near future is going to be used in more things that you can imagine.

Multi-agent environments are going to be very common for these bots, so we have to take it into consideration when we are designing them.

The policy by what we evaluate the performance of the bot is crucial, and many people should take part in defining it and be aware of the policy that is used.

Making policies for the greater good is a must to ensure we keep equality among users.

Try it out

The best way to learn is to try, so for those who have some coding skills, if you want to make some tests here is the code of the simulation.

Try changing the reward function, the number of cars and destinations. You can experiment in many different cases, as well with the hyper-parameters of the DQN.

If you found this article helpful, share it with your friends . All constructive criticism is welcome, and play around with new ideas. You can also find me on Linkedin and Github to make contributions.