Deep Reinforcement : Imitation Learning

Partha Sen
2 min readAug 22, 2017

--

End to End Learning for Self-Driving Cars, Bojarski et al. 2016

Is Behavior Cloning/Imitation Learning as Supervised learning possible?

Answer is NO to clone behavior of animal or human but worked well with autonomous vehicle. https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network . DARPA Autonomous Vehicle (DAVE) used ALVINN model and later NVIDIA ( Bojarski et al.’ 16, NVIDIA) model (CNNs) shown cloning is possible to learn the entire task of lane and road following without manual decomposition into road or lane marking detection, semantic abstraction, path planning, and control. Model was able able to learn meaningful road features from a very sparse training signal (steering alone).

Behavior Learning or imitation learning is successful when the trajectory distribution (policy with state-action) of agent or learner matches the expert or trainer (GANs- Generative Adversarial Networks, Goodfellow et al. 2014). Challenge in cloning is actions along trajectory is interdependent!

We directly supervise learning to map states to actions by demonstrating trajectories and by showing the ways to handle the neglect of action interdependence. Learning latent rewards or goals is indirect (Inverse Reinforcement Learning!).

Who are experts here?

Experts are human, Optimal or near Optimal Planners/Controllers with assumptions like expert trajectories are i.i.d. and training distribution matches test data distributions.

Observation(Ot) -> Model (policy(Ut|Ot))->Action(Ut)

Manual Driving ->Training Data ->Supervised Learning ->policy(Ut|Ot)

Data Set Aggregation

https://www.cs.cmu.edu/~sross1/publications/Ross-AIStats11-NoRegret.pdf

  1. Augmentation:

Augmentation helps us extract as much information from data as possible. We will generate additional data using the following data augmentation techniques. Augmentation is a technique of manipulating the incoming training data to generate more instances of training data. This technique has been used to develop powerful classifiers with little data.

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html . However, augmentation is very specific to the objective of the neural network.

References

https://papers.nips.cc/paper/2847-off-road-obstacle-avoidance-through-end-to-end-learning

http://repository.cmu.edu/cgi/viewcontent.cgi?article=2874&context=compsci

https://katefvision.github.io/katefSlides/immitation_learning_I_katef.pdf

--

--