SAR: Leveraging Motor Neuroscience for SOTA High-Dimensional Control
In this post, we introduce the core ideas presented in our recent paper, SAR: Generalization of Agility and Dexterity via Synergistic Action Representation. This project was led by Cameron Berg during his yearlong AI Residency at Meta under the continuous mentorship and guidance of Vittorio Caggiano and Vikash Kumar.
This post is intended to serve as a high-level overview of SAR. Those interested should navigate to the project site for technical details, code implementation, and more.
Background: what makes motor control hard?
Fundamentally, SAR can be thought of as a neuroscience-inspired method for addressing the inherent challenges of high-dimensional continuous control.
At the outset, it should be recognized that training any system to yield some specific target behavior within the space of all possible behaviors becomes exponentially more complex with each additional degree of freedom. This problem is sometimes referred to as the curse of dimensionality or combinatorial explosion. Therefore, learning an effective control policy for a system with a very large number of degrees of freedom requires a method for contending with the vast complexity of the associated search space.
To date, the dominant approaches for handling high-dimensional search in reinforcement learning include learning from expert demonstrations, as well as ‘brute force’ learning using a huge number of samples. These strategies have important drawbacks: learning from experts requires significant human domain-specific knowledge, and ‘brute force’ learning is of course computationally expensive. SAR represents an alternate method for dealing with the challenges of high-dimensional control that requires neither domain-specific expert knowledge nor a huge number of training samples.
Though the problem of high-dimensional control is more conventionally studied through the primary lens of machine learning, we propose that insights from motor neuroscience also prove highly relevant for this domain. Why? Consider that animal nervous systems have evolved over billions of years to control and regulate motor activity in the form of (high-dimensional continuous) spiking muscle activations. In spite of the challenges of high-dimensional control previously discussed, it is self-evident that animal nervous systems are adept at handling this problem, yielding the rich diversity of impressive behaviors displayed throughout the animal kingdom.
Even more remarkable is that musculoskeletal control introduces additional challenges over and above the curse of dimensionality problem described previously. These include (1) indirect joint control via pull-only muscle forces, (2) many-to-one and one-to-many relationships between muscles and joints (multiarticularity), (3) and there being more muscles than degrees of freedom (overactuation). SAR sets out by asking the deceptively simple motor neuroscience question — how did the brain evolve to overcome these challenges to yield robust motor control?
Muscle synergies simplify the control problem
Though SAR is by no means intended to represent a comprehensive answer to this question, it is clear that one essential and increasingly well-understood neurophysiological mechanism that facilitates robust musculoskeletal control is muscle synergies, or coordinated patterns of muscle co-contractions. Functionally speaking, the core hypothesis is as follows: instead of continuously computing activations for the many hundreds of voluntary muscles in the body, the nervous system directly computes a much smaller set of patterns of muscle activations. The presence of these muscle synergies has been observed extensively — and with high cross-species continuity — in humans and animals, and they have been demonstrated to be implemented directly in efferent pathways of the spinal cord. The fundamental insight here is that by construing muscle activity as a function of a reduced set of basic synergistic patterns as opposed to a far more complex set of individual muscle activations, the nervous system is able to learn robust motor control without any strong external supervisory signals. We use this key idea to power the SAR method.
More specifically, SAR operates in a task-agnostic manner using the same basic recipe:
- Select some target behavior (e.g., locomotion)
- Train a policy to learn a simplified version of the target behavior (e.g., a single step)
- Build a synergistic action representation (SAR) using muscle activation data from this trained policy
- Use this constructed SAR to train a policy on the target task using learned synergies
MyoSuite Experiments and Results
For determining the utility of this method, we trained MyoSuite’s physiologically accurate musculoskeletal leg and hand models to acquire agility and dexterity. These models represent the state-of-the-art both for (a) extracting and computing muscle synergies, and (b) efficient and contact-rich reinforcement learning.
We find that the SAR method uniquely enables sample-efficient locomotion and manipulation, while baseline approaches generally fail to yield meaningful behavior.
For acquiring locomotion, we begin by training an agent to take a single step, and we use the resultant muscle activation time series data to power a synergistic action representation that subsequently enables robust locomotion across many different terrain conditions in only 4M total samples (including the base policy from which SAR is extracted and computed). By contrast, simply training without SAR (i.e., standard RL) on the locomotion task for 4M steps yields no meaningful behavior.
For acquiring manipulation, we begin by training an agent to manipulate a small set of parametric geometries before extracting a synergistic action representation and training with these synergies on a much larger set of parametric geometries. As in the case of locomotion, training a policy without SAR directly on the larger set largely fails to yield meaningful dexterity.
Although we use MyoSuite and musculoskeletal control as a testbed for assessing SAR, we were also able to successfully extrapolate the exact same method to other high-dimensional continuous control problems that do not use muscular control.
Extending SAR beyond musculoskeletal control
The first extension replicates the MyoLegs locomotion result in the full-body humanoid agent. The second extension replicates the MyoHand parametric geometry manipulation result in the robotic ShadowHand. These results demonstrate that while SAR may have a theoretical foundation in neuromotor control, it is by no means limited to this application.
Conclusion
We have introduced SAR as a method for handling the high-dimensional continuous control problem. Directly inspired by evolved strategies for human and animal motor control, the SAR method autonomously discovers and leverages a useful submanifold in the high-dimensional action space that efficiently facilitates significantly improved control. For more information on SAR, please visit the project site.