Offline Reinforcement Learning — Enables Algorithms to Better Understanding the WORLD;

Gayan Samuditha
Expo-MAS
Published in
5 min readDec 11, 2021

UC Berkeley’s research team Says Combining Self-Supervised and Offline RL Could Enable Algorithms That Understand the World Through Actions

+++++++++++++++++++++++++++++++++++++++

The idiom “actions speak louder than words” first appeared in print almost 300 years ago. A new study echoes this view, arguing that combining self-supervised and offline reinforcement learning (RL) could lead to a new class of algorithms that understand the world through actions and enable scalable representation learning,

Machine learning (ML) systems have achieved outstanding performance in domains ranging from computer vision to speech recognition and natural language processing, yet still, struggle to match the flexibility and generality of human reasoning. This has led ML researchers to search for the “missing ingredient” that might boost these systems’ ability to understand, reason, and generalize.

========================================

RESEARCH :

- Understanding the World Through Action -

In the paper Understanding the World Through Action, UC Berkeley assistant professor in the department of electrical engineering and computer sciences Sergey Levine suggests that a general, principled, and powerful framework for utilizing unlabelled data could be derived from RL to enable ML systems leveraging large datasets to better understand the real world.

Several hypotheses have been advanced to address this “missing ingredient” question in ML systems, such as causal reasoning, inductive bias, and better algorithms for self-supervised or unsupervised learning. Levine says that while the problem is challenging and involves a great deal of guesswork, recent progress in AI can provide some guiding principles: 1) The “unreasonable” effectiveness of large, generic models supplied with large amounts of training data; 2) How manual labeling and supervision do not scale nearly as well as unsupervised or self-supervised learning.

Abstract:

The recent history of machine learning research has taught us that machine learning methods can be most effective when they are provided with very large, high-capacity models, and trained on very large and diverse datasets. This has spurred the community to search for ways to remove any bottlenecks to scale. Often the foremost among such bottlenecks is the need for human effort, including the effort of curating and labeling datasets. As a result, considerable attention in recent years has been devoted to utilizing unlabeled data, which can be collected in vast quantities. However, some of the most widely used methods for training on such unlabeled data themselves require human-designed objective functions that must correlate in some meaningful way to downstream tasks. I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning, using general purpose unsupervised or selfsupervised reinforcement learning objectives in concert with offline reinforcement learning methods that can leverage large datasets. I will discuss how such a procedure is more closely aligned with potential downstream tasks, and how

Download here: arXiv.

*******************************************************************

Levine believes the next bottleneck facing ML researchers involves deciding how to train large models without manual labeling or manual design of self-supervised objectives so as to acquire models that distill a deep and meaningful understanding of the world and are able to perform downstream tasks with robust generalization and even a degree of common sense.

To achieve this goal, autonomous agents will require an understanding of their environments that is causal and generalizable. Such agents would advance beyond the current RL paradigm, where 1) RL algorithms require a task goal (i.e., a reward function) to be specified by experts; and 2) RL algorithms are not inherently data-driven, but rather learn from the online experience, an approach that limits both generalization ability and the ability to learn about how the real world works.

REFERENCE FROM: TechVidvan

Levine envisions algorithms that, rather than aiming at a single user-specified task, seek to accomplish whatever outcomes they infer are possible in the real world. He proposes developing offline RL algorithms that can effectively utilize previously collected datasets to enable a system that can use its training time to learn and perform user-specified tasks while also using its collected experience as offline training data to learn to achieve a wider scope of outcomes.

They believe offline RL has the potential to significantly increase the applicability of self-supervised RL methods and can be utilized in combination with goal-conditioned policies to learn entirely from previously collected data.

Basics — Reinforcement Learning | LaptrinhX
RL methods

Overall, the paper explores how self-supervised RL combined with offline RL could realize scalable representation learning. Self-supervised training can enable models to understand how the world works, and fulfilling self-supervised RL objectives can allow models to gain a causal understanding of the environment. Such techniques must be applicable at scale to real-world datasets, a challenge met by offline RL, which enables the use of large, diverse previously collected datasets.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--

--

Gayan Samuditha
Expo-MAS

Software Engineer , Biologist, Techie, Try to Save the Human Being with Combination of Medical Informatics and AI