Elastic Decision Transformer

Published in

OMRON SINIC X

3 min readDec 4, 2023

We are thrilled to announce that our paper about offline reinforcement learning has been accepted to the Thirty-seventh Conference on Neural Information Processing Systems(NeurIPS 2023) (poster presentation, acceptance rate 26.1%)!

Yueh-Hua Wu¹², Xiaolong Wang², Masashi Hamaya¹, “Elastic decision transformer” [paper] [project page] [code]. ¹OMRON SINIC X, ²UC San Diego

The first author, Yueh-Hua Wu, is a Ph.D. student at UC San Diego and was a research intern at OMRON SINIC X.

This blog briefly introduces our proposed method.

Background

Reinforcement Learning has demonstrated impressive results across diverse applications such as game-playing, robotics, and recommendation systems. A notable area of RL is Offline RL, which employs pre-collected data for agent training and proves more efficient when real-time interactions are costly or limited.

One of the most popular approaches is the Decision Transformer (DT), which has a Transformer architecture to model and reproduce sequences from demonstrations, integrating a goal-conditioned policy to convert Offline RL into a supervised learning task. However, the DT falls short in achieving trajectory stitching, a desirable property in Offline RL that refers to creating an optimal trajectory by combining parts of sub-optimal trajectories.

We introduce the Elastic Decision Transformer (EDT), which takes a variable length of the traversed trajectory as input. We suggest that in order to ‘refresh’ the prediction model, it should disregard ‘negative’ or ‘unsuccessful’ past experiences. This involves dismissing past failures and instead considering a shorter history for input. based on as much information as possible. While the prediction model with a shorter history tends to output with higher variance, it facilitates exploring and identifying improved trajectories. Conversely, when the current trajectory is already optimal, the model should consider the longest possible history for input to enhance stability and consistency.

Elastic Decision Transformer

First, we introduce the motivation of EDT. The figure below shows a toy example of trajectory stitching. We consider a dataset composed of only two trajectories (a and b). A sequence model trained with this dataset is likely to predict the next states in a manner consistent with their original trajectories. A non-stitching model starting from state b at t-1 timestep may end up with a sub-optimal state b at t+1. However, if we discard the past history, the model will be able to stitch with the state a at t+1 and generate a more optimal trajectory.

In the EDT, we adhere to the same training procedure as used in the DT. The key distinction lies in the training objective — we aim to estimate the maximum achievable return for a given history length in EDT.

During action inference phase in test time, we estimate the maximum achievable return for each history length. Subsequently, we predict the action by using the truncated traversed trajectory as input.

Experiments

We evaluate the multi-task learning ability of our model across diverse tasks, focusing on locomotion with D4RL and Atari tasks. Locomotion tasks utilize vectorized observations, while Atari tasks depend on image observations. To emphasize the role of trajectory stitching, we restrict our datasets to medium-replay datasets for the four locomotion tasks and datasets derived from DQN Replay for the 20 Atari tasks. The “medium-replay”, sourced from this policy’s replay buffer, poses a greater challenge for sequence modeling approaches.

We compare our approach with the baselines: Decision Transformer and Implicit Q-learning. Our approach showed higher scores than the baselines in most locomotion tasks and all Atari tasks.

YouTube video

The more detailed information is available from the video.

Call for interns

We are actively hiring interns who are interested in machine learning and robotics. If you want to apply for our internship, please visit our website!

Elastic Decision Transformer

Background

Elastic Decision Transformer

Experiments

YouTube video

Call for interns

Written by Masashi Hamaya