Deep Reinforcement : Imitation Learning

Partha Sen
Aug 22, 2017 · 2 min read
End to End Learning for Self-Driving Cars, Bojarski et al. 2016

Is Behavior Cloning/Imitation Learning as Supervised learning possible?

Answer is NO to clone behavior of animal or human but worked well with autonomous vehicle. https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network . DARPA Autonomous Vehicle (DAVE) used ALVINN model and later NVIDIA ( Bojarski et al.’ 16, NVIDIA) model (CNNs) shown cloning is possible to learn the entire task of lane and road following without manual decomposition into road or lane marking detection, semantic abstraction, path planning, and control. Model was able able to learn meaningful road features from a very sparse training signal (steering alone).

Behavior Learning or imitation learning is successful when the trajectory distribution (policy with state-action) of agent or learner matches the expert or trainer (GANs- Generative Adversarial Networks, Goodfellow et al. 2014). Challenge in cloning is actions along trajectory is interdependent!

We directly supervise learning to map states to actions by demonstrating trajectories and by showing the ways to handle the neglect of action interdependence. Learning latent rewards or goals is indirect (Inverse Reinforcement Learning!).

Who are experts here?

Experts are human, Optimal or near Optimal Planners/Controllers with assumptions like expert trajectories are i.i.d. and training distribution matches test data distributions.

Observation(Ot) -> Model (policy(Ut|Ot))->Action(Ut)

Manual Driving ->Training Data ->Supervised Learning ->policy(Ut|Ot)

Data Set Aggregation

https://www.cs.cmu.edu/~sross1/publications/Ross-AIStats11-NoRegret.pdf

  1. Augmentation:

Augmentation helps us extract as much information from data as possible. We will generate additional data using the following data augmentation techniques. Augmentation is a technique of manipulating the incoming training data to generate more instances of training data. This technique has been used to develop powerful classifiers with little data.

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html . However, augmentation is very specific to the objective of the neural network.

References

https://papers.nips.cc/paper/2847-off-road-obstacle-avoidance-through-end-to-end-learning

http://repository.cmu.edu/cgi/viewcontent.cgi?article=2874&context=compsci

https://katefvision.github.io/katefSlides/immitation_learning_I_katef.pdf

)

Partha Sen

Written by

Building prediction engine @ Algoix Technologies. Data scientist @ TokenData.ai

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade