Google Brain’s DRL Helps Robots ‘Think While Moving’

Synced
SyncedReview
Published in
4 min readMay 12, 2020

When chasing a bouncing ball, a human will head where they anticipate the ball is going. If things change — for example a cat swats the ball and it bounces off in a new direction — the human will correct to an appropriate new route in real time.

Robots can have a hard time making such changes, as they tend to simply observe states, then calculate and execute actions, rather than thinking while moving.

Google Brain, UC Berkeley, and X Lab have proposed a concurrent Deep Reinforcement Learning (DRL) algorithm that enables robots to take a broader and more long-term view of tasks and behaviours, and decide on their next action before the current one is completed. The paper has been accepted by ICLR 2020.

Deep Reinforcement Learning (DRL) has achieved tremendous success in scenarios such as zero-sum games and robotic grasping. These achievements however were seen largely in blocking environments — where the model assumes there will be no change of state in the time between a state being observed and any action(s) being executed.

In the real world “concurrent environments,” however, the environmental states can evolve substantially in real time, and actions executed in a sequential blocking fashion can fail because the environment has changed since the agent initially computed the action.

(a): In “blocking” environments, state capture and policy inference are assumed to be instantaneous. (b): In “concurrent” environments, state capture and policy inference are assumed to proceed concurrently to action execution.

The main idea of the proposed model is to enable a robot to act with concurrent control, “where sampling an action from the policy must be done concurrently with the time evolution.”

The researchers first used standard RL methods in both discrete-time and continuous-time settings. They then applied Markov Decision Processes (MDPs) with concurrent actions, where concurrent action environments capture the current state while a previous action is still being executed. The team concluded that MDP modifications are sufficient to represent concurrent actions.

(a): In “blocking” MDPs, the environment state does not change while the agent records the current state and selects an action. (b): In “concurrent” MDPs, state and action dynamics are continuous-time stochastic processes s(t) and a_i(t).

The research team introduced value-based DRL algorithms that can cope with concurrent environments, and evaluated their methods on both a large-scale robotic grasping task simulation and a real-world robotic grasping task.

Overview of the robotic grasping task
Large-scale simulated robotic grasping results
Real-world robotic grasping results

In the concurrent large-scale simulated robotic grasping task the proposed concurrent model acted 31.3 percent faster than the blocking execution baseline model. In the real-world robotic grasping task, the concurrent model was able to learn smoother trajectories that were 49 percent faster.

The paper Thinking While Moving: Deep Reinforcement Learning with Concurrent Control is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global