An Ecological Perspective on Reinforcement Learning

Sergey Levine
12 min readDec 15, 2021

--

It has nothing to do with rainforests, at least not necessarily. J. J. Gibson’s seminal work “The Ecological Approach to Visual Perception” argues that perception — and, indeed, cognition, — cannot be understood independently of the environment in which it takes place. Gibson argued that the relationship between perception and action is strongly situated, and its effective functioning is dependent on the interaction between the embodied agent (that is, the animal) and its environment. These ideas were proposed in 1979 in part as a counterpoint to a cognitivist model of the mind that instead emphasized internal information processing, which could in some sense be viewed as general and context-free.

It is fashionable when designing AI algorithms to think of methods that are general and in some sense “universal”: a good reinforcement learning algorithm, for example, is one that can plausibly be applied to any environment (i.e., any Markov decision process), and avoids domain-specific assumptions. One argument for this view is that a truly general algorithm could be developed and evaluated on relatively simple and manageable benchmark problems, and then applied broadly at scale to large and realistic domains, where it can lead to remarkable superhuman proficiency, and perhaps even emergent behavior that surprises its own designers.

But could it be that, much like Gibson’s argument about perception in humans and animals, the capacity for reinforcement learning algorithms to lead to intelligent behavior cannot be understood independently of the environment in which they are situated? Indeed, one can argue that however general or capable our own learning faculties might be, in order for human beings to acquire a useful and deep understanding of the world they require a learning process that is appropriately scaffolded by (at least initially) a hospitable environment that provides a degree of security and support. Less abstractly, in the world of machine learning, the most impressive examples of generalization have been enabled just as much by utilizing the right datasets as by the use of the right models and algorithms. Certainly the algorithm for training, for example, GPT-3 is not by itself especially interesting. It is only when this model is “situated” in the real world, by providing it with a very large dataset that covers a breadth of human-written text, that it begins to exhibit interesting generalization. One might say that the environment in which it is situated is thus far more important than the model itself in terms of providing for effective inferences in new situations. A particularly vivid illustration of this principle played out in the early days of deep learning: one of the remarkable things about neural networks as applied to computer vision was that they learned early visual features that bore a striking similarity to the Gabor filters that are experimentally observed in the mammalian visual cortex. However, researchers soon discovered that many different algorithms would learn such filters if trained on realistic image data, including relatively naive techniques such as k-means clustering — the features were more a consequence of the data than of the choice of model. They were simply embedded in the structure that underlies real images, and any model with enough capacity to get at them would learn to use them.

I would posit that the importance of the environment is likely even greater for reinforcement learning: while standard supervised learning methods only need to be provided with a static dataset, reinforcement learning agents need to be provided with a true environment, which they can interact with, and which in turn responds to their actions. It would then stand to reason that we are unlikely to see reinforcement learning methods exhibit interesting generalization or emergent behavior unless we situate them in environments that support — and demand — this sort of generalization. This means that the choice of environment is a crucial piece of the puzzle, perhaps just as crucial as the design of the algorithm, and one that we should pay close attention to if we are to develop learning algorithms that lead to flexible and adaptive machines. Conversely, failing to situate learning agents in the right kind of environment might mean that even a very good algorithm doesn’t perform the way we might want, and we might miss the mark, thinking that more algorithms development is needed when in fact we just need to situate an existing method in the right way.

The case for real-world reinforcement learning

If it takes a village to raise a child, then training that child in a simulator requires first creating the village. Needless to say, this is a highly recursive endeavor. The real world provides us with an environment that we know demands flexible and generalizable intelligence, and it frees us from the need to expend gratuitous human effort to manually design the content that will push reinforcement learning agents to acquire generalizable behaviors. While it might seem like we can make considerable headway simply by situating reinforcement learning agents in simulated worlds and video games, if it is in fact the case that intelligent behavior and generalization emerges at the intersection of the algorithm and the environment (as indeed appears to be the case with supervised learning!), then at some point the effort necessary to design suitable environments might exceed the effort necessary to build the learning algorithm in the first place.

It has long been understood in robotics that even simple real-world tasks can be deceptively complex as compared to games or other “cognitively demanding” activities. This notion is perhaps most clearly captured by Moravec’s paradox:

We are all prodigious olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.

Steven Pinker summarized this notion more vividly: “recognizing a face, lifting a pencil, walking across a room, answering a question — in fact solve some of the hardest engineering problems ever conceived.” Perhaps it is precisely the difficulty of these seemingly mundane everyday tasks that creates the scaffolding necessary to facilitate broad generalization and flexibility. If simply picking up a spoon to feed yourself already requires solving “the hardest engineering problems ever conceived,” a learning agent that is confronted with such an environment would need to acquire a repertoire of problem-solving tools that can be applied very broadly, and then flying an airplane or designing a rocket might seem like child’s play by comparison. But at the same time, the real world can be surprisingly merciful in this regard: while the rocket designer might not have much room for failure, the child trying to pick up their spoon is likely to get considerable assistance in the matter. Solving a “hard exploration problem” through trial and error is rarely the bottleneck for survival for human children.

If we entertain the premise that the real world offers us a qualitatively different set of challenges and opportunities compared to simpler simulated tasks and games, we might still ask whether this actually matters to the algorithm designer. After all, might it not be the case that we can simply design effective algorithms by testing them in simulated worlds and then, once they are ready, “unleash” them on the real world and watch them first figure out both how to pick up the spoon, and then how to design the rocket?

How is the real world different from a video game?

In a talk that I prepared for the 2021 NeurIPS Workshop on Deep Reinforcement Learning, I lay out an example of a hypothetical agent that, like Robinson Crusoe, is stranded on an island and left to fend for itself. Can it find shelter and food (or fresh batteries)? Can it survive long enough to be rescued? Adaptive and flexible intelligence is arguably at its best when confronted with unfamiliar and risky environments. It does not excel at integral and differential calculus, multiplying 20-digit numbers, or calculating odds in complex games of chance. It does not excel at playing Go or chess either — these games are entertaining to us precisely because they stress our faculties in ways that we are not accustomed to. Thus, if we want to focus on problems that are easy for humans but hard for machines, we should aim to develop learning methods that excel in terms of flexibility and adaptability, not their ability to master individual narrow tasks and attain superhuman performance. We could then ask: what kind of environment would prepare an intelligent agent to expect the unexpected?

While I don’t have a single answer to this, we can speculate about a few different facets that characterize real-world environments that might not be present in simulated tasks, video games, or reinforcement learning benchmarks:

  • The real world is not episodic. Most current benchmark tasks, and virtually all games, are structured into episodes, where an agent attempts the task multiple times and learns from multiple repeated trials with similar or identical initial conditions. This is intended to provide an analogy for “practicing” a task repeatedly. But for people, the mere notion that tasks should be practiced through repeated attempts must itself be learned, and the process of scaffolding a task so that it may be practiced takes effort. Some tasks, like running away from a hungry predator, are not amenable to such practice, and humans and animals instead utilize experience from lower-stakes tasks to generalize to such stressful situations. This is one interpretation for the evolutionary causes behind play. While this distinction might at first seem like it only makes learning harder, it may well be that agents that must figure out practicing and playing will be better positioned to understand how to transfer experience across different situations. Certainly the disconnect in regard to episodic learning today presents a rich space of algorithmic challenges, which have been explored in the domain of robotic learning (see, e.g., [1][2][3][4]).
  • The real world has a large “dynamic range” in terms of levels of abstraction. Games present a metaphor for the real world that abstracts away higher-level and lower-level concerns. For example, chess is a metaphor for a battle between two armies, but a chess player doesn’t have to worry about logistics, setting national tax policies to finance their army, or persuading their electorate that the battle is in their national interest (higher-level concerns), nor about how the bishop should move their limbs to walk from one tile to another (lower-level concerns). An embodied agent in the real world must deal with many levels of abstraction at once, and indeed humans often use “lower-level” mechanistic metaphors as a tool to reason about “higher-level” concepts (imagine a CEO talking about how their company has a great deal of “momentum” and is sure to “knock down the competition”). This high dynamic range presents a challenge and perhaps requires hierarchical reasoning, but it also provides opportunities, as concepts learned at the “lower level” and on shorter time scales can provide useful tools for reasoning about higher level decisions, where trial-and-error learning may be impractical.
  • Success at real-world problems is usually about being “good enough” rather than “optimal.” While some of the most impressive human accomplishments involve feats of intelligence, strength, or agility that push the boundaries of what a person can do, these feats are impressive precisely because they are unusual. The more important and evolutionarily significant problems simply require good enough solutions that allow the agent to move on to another more pressing priority. Often solving a problem quickly and efficiently is far more important than solving it to the highest level of performance. This again is both a challenge and an opportunity — if the agent is situated in an environment where there is so much to do that its main priority is to do each thing quickly and move on, it will have a greater breadth of experience to draw on for generalization than if its entire existence were centered on solving one narrow task extremely well. On the other hand, attaining ever-higher scores on tests of “superhuman ability”, as well as currently used reinforcement learning benchmark problems, seems unlikely to advance the capacity of RL methods to solve broad and diverse problems with flexibility and “good enough” performance.
  • The real world is inhabited by other agents. These agents can provide assistance, but they also create a never-ending source of novelty and complexity. It has often been argued that the evolutionary pressure caused by social interaction is one of the driving forces behind the growth in human intelligence, and the implications of this for AI are profound (see, for example, this recent discussion by Pierre-Yves Oudeyer). I am personally a bit more skeptical about placing this factor on a pedestal above all others, as flexible, intelligent, and adaptable decision making is also prevalent among animals that are not particularly social, but it is certainly an important element that distinguishes real-world environments from most simulations.

From the above discussion, it should be apparent that the differences between the real world and the kinds of environments that we often use to study reinforcement learning present major challenges, but also often present major opportunities. It may well be that, though situating our agents in more realistic settings may make learning harder, in some ways it may make generalization, extrapolation, and flexibility substantially easier. Additionally, these differences suggest a significant shift in the focus for algorithm design: merely improving performance on standard reinforcement learning benchmarks seems unlikely to enable our algorithms to handle the factors described above. Making that shift is important: whatever the effect of these differences is on the learning problem, it seems unlikely that we can divorce the problem of enabling flexible and adaptable agents from the problem of deciding what kind of environment to situate them in.

A study of the conditions for successful RL

In 2020, my students and I wrote a concept paper called “Ecological Reinforcement Learning” that aimed to study the effect of some environment properties on the performance of RL agents. Although this study was by no means exhaustive (or even particularly complete), our goal was to understand whether several properties that seemed more “realistic” actually made reinforcement learning harder or easier. We selected three basic ingredients that we thought differentiate realistic settings from games and simulated worlds: (1) the absence of “resets” (i.e., whether the agent has to learn over the course of one long lifetime, or over the course of many disconnected episodes); (2) “dynamism” — whether the environment evolves on its own even if the agent does not take particularly meaningful or coordinated actions; (3) “environment shaping” — whether the environment is arranged in such a way that naturally creates a curriculum for the agent, with easier problems being presented first. Of course, instantiating (2) and (3) in a naturalistic way is very hard. In the real world, “dynamism” exists because other people, animals, and physical phenomena outside of our control create varied and unpredictable situations even if the agent sits around doing nothing (this is a topic we explored in much more detail in a subsequent paper on surprise minimization). But in the simple experiments we constructed in our simulated study, “dynamism” involved randomized dynamics — a highly imperfect proxy. Similarly, in the real world “environment shaping” might occur because cooperative humans — the parents of a child, or a trainer training a dog, — would intentionally avoid putting the agent into situations that it couldn’t possibly handle. We of course could not simulate this in our simple experiments, and instead heuristically modified the environment over the course of training. As an example, in a task that required the agent to “hunt” deer, dynamism involved the degree to which the deer moved about randomly, and environment shaping determined whether they started out closer or further away. Our findings in this paper reflect much of the discussion in the previous section: aspects of the environment that reflect properties of the real world to some degree offer both challenges and opportunities. While reset-free learning was significantly harder than conventional episodic learning in standard settings, both environment shaping and (perhaps surprisingly) dynamism made the learning problem significantly easier even without resets. While of course a more realistic and deeper empirical investigation is necessary to fully understand these properties, this work might serve as at least a preliminary indication that, when we study environments with properties that more intentionally reflect the challenges in the real world, we will encounter both difficulties and opportunities.

More recently, we organized a workshop at NeurIPS 2021 around the questions surrounding the ecological perspective on reinforcement learning. This workshop drew considerable interest, with an exciting set of talks about the interplay between environments and learning algorithms.

Conclusions and open questions

If the study of reinforcement learning is to lead to flexible, adaptable, and generalizable intelligent agents, we will need to consider not only the design of the particular reinforcement learning agent, but the environment in which this agent is situated. We know that flexible and adaptable intelligence can emerge in agents that are situated in the real world, and we know that a number of significant differences exist between the real world and most commonly used simulations. We don’t know which of these differences are actually important, but it does seem like many of them provide both challenges and opportunities: they can make learning harder along some axes, but provide better generalization and adaptability along other axes. Our algorithms might need to change to handle them, so we should think carefully about situating our RL agents in domains that reflect more of the properties of the real world. As a roboticist, my own personal bias of course is to situate in the real world directly, so that all of the complexity and messiness of reality is in play. But this is not the only approach, and I would encourage any researchers interested in this to carefully consider that, as with Gibson’s thesis about visual perception, perhaps a central question we should ponder is not just what algorithm is inside your agent, but what environment your agent is in.

--

--

Sergey Levine

Sergey Levine is a professor at UC Berkeley. His research is concerned with machine learning, decision making, and control, with applications to robotics.