Reinforcement Learning is kind of deep learning algorithm, which is widely used in gaming area. When it is hard to label the training data, however we know what is a good action with the environment, we could apply Reinforcement Learning to solve this kind of problems.
action = f(observation)
we hope we could find a function or policy which could maximize the expected sum of reward.
steps of Reinforcement Learning
we could use cnn,rnn,or transformer to train the network
Loss:-total reward
max total reward = min -total reward
find θ to get optimize loss
this rl task is similar with gan, but for gan the discriminator is neronetwork, however env and reward in rl is blackbox even with randomness. The same part is that they all want to find best θ to max the target.
we want our actor faceing s1 situation take a1, s2 situation do not take a2.