SAR: Leveraging Motor Neuroscience for SOTA High-Dimensional Control

6 min readAug 21, 2023

In this post, we introduce the core ideas presented in our recent paper, SAR: Generalization of Agility and Dexterity via Synergistic Action Representation. This project was led by Cameron Berg during his yearlong AI Residency at Meta under the continuous mentorship and guidance of Vittorio Caggiano and Vikash Kumar.

This post is intended to serve as a high-level overview of SAR. Those interested should navigate to the project site for technical details, code implementation, and more.

Background: what makes motor control hard?

Fundamentally, SAR can be thought of as a neuroscience-inspired method for addressing the inherent challenges of high-dimensional continuous control.

At the outset, it should be recognized that training any system to yield some specific target behavior within the space of all possible behaviors becomes exponentially more complex with each additional degree of freedom. This problem is sometimes referred to as the curse of dimensionality or combinatorial explosion. Therefore, learning an effective control policy for a system with a very large number of degrees of freedom requires a method for contending with the vast complexity of the associated search space.

Curse of dimensionality in motor control. Robust locomotion represents an extremely specific submanifold In the space of all possible ways to activate 80 muscles across the two legs. It is very challenging for traditional RL algorithms to efficiently search this immense space (see above; SAC for 3M steps).

To date, the dominant approaches for handling high-dimensional search in reinforcement learning include learning from expert demonstrations, as well as ‘brute force’ learning using a huge number of samples. These strategies have important drawbacks: learning from experts requires significant human domain-specific knowledge, and ‘brute force’ learning is of course computationally expensive. SAR represents an alternate method for dealing with the challenges of high-dimensional control that requires neither domain-specific expert knowledge nor a huge number of training samples.

Though the problem of high-dimensional control is more conventionally studied through the primary lens of machine learning, we propose that insights from motor neuroscience also prove highly relevant for this domain. Why? Consider that animal nervous systems have evolved over billions of years to control and regulate motor activity in the form of (high-dimensional continuous) spiking muscle activations. In spite of the challenges of high-dimensional control previously discussed, it is self-evident that animal nervous systems are adept at handling this problem, yielding the rich diversity of impressive behaviors displayed throughout the animal kingdom.

Even more remarkable is that musculoskeletal control introduces additional challenges over and above the curse of dimensionality problem described previously. These include (1) indirect joint control via pull-only muscle forces, (2) many-to-one and one-to-many relationships between muscles and joints (multiarticularity), (3) and there being more muscles than degrees of freedom (overactuation). SAR sets out by asking the deceptively simple motor neuroscience question — how did the brain evolve to overcome these challenges to yield robust motor control?

Multiarticularity (one-to-many control dynamic) in physiological control. Left: the *flexor digitorum profundus* muscle in the arm contributes to controlling multiple fingers in the hand. Right: the hamstring and rectus femoris muscles in the leg contribute to controlling both the hip and knee joints. [Image credits: https://en.wikipedia.org/wiki/Flexor_digitorum_profundus_muscle#/media/File:Flexor-digitorum-profundis.png and https://www.frontiersin.org/files/Articles/450201/fnbot-13-00017-HTML/image_m/fnbot-13-00017-g002.jpg]

Muscle synergies simplify the control problem

Though SAR is by no means intended to represent a comprehensive answer to this question, it is clear that one essential and increasingly well-understood neurophysiological mechanism that facilitates robust musculoskeletal control is muscle synergies, or coordinated patterns of muscle co-contractions. Functionally speaking, the core hypothesis is as follows: instead of continuously computing activations for the many hundreds of voluntary muscles in the body, the nervous system directly computes a much smaller set of patterns of muscle activations. The presence of these muscle synergies has been observed extensively — and with high cross-species continuity — in humans and animals, and they have been demonstrated to be implemented directly in efferent pathways of the spinal cord. The fundamental insight here is that by construing muscle activity as a function of a reduced set of basic synergistic patterns as opposed to a far more complex set of individual muscle activations, the nervous system is able to learn robust motor control without any strong external supervisory signals. We use this key idea to power the SAR method.

More specifically, SAR operates in a task-agnostic manner using the same basic recipe:

Select some target behavior (e.g., locomotion)
Train a policy to learn a simplified version of the target behavior (e.g., a single step)
Build a synergistic action representation (SAR) using muscle activation data from this trained policy
Use this constructed SAR to train a policy on the target task using learned synergies

MyoSuite Experiments and Results

For determining the utility of this method, we trained MyoSuite’s physiologically accurate musculoskeletal leg and hand models to acquire agility and dexterity. These models represent the state-of-the-art both for (a) extracting and computing muscle synergies, and (b) efficient and contact-rich reinforcement learning.

We find that the SAR method uniquely enables sample-efficient locomotion and manipulation, while baseline approaches generally fail to yield meaningful behavior.

For acquiring locomotion, we begin by training an agent to take a single step, and we use the resultant muscle activation time series data to power a synergistic action representation that subsequently enables robust locomotion across many different terrain conditions in only 4M total samples (including the base policy from which SAR is extracted and computed). By contrast, simply training without SAR (i.e., standard RL) on the locomotion task for 4M steps yields no meaningful behavior.

Comparing end-to-end RL (RL-E2E) and SAR-RL for learning locomotion. RL-E2E: train on locomotion task for 3M samples. SAR-RL: train on locomotion task for 1.5M steps → extract SAR → retrain on locomotion task for 1.5M steps with SAR. SAR-RL learns to walk approximately 15x farther than RL-E2E in the same number of training steps.

Synergies extracted from an uncoordinated walking policy on flat ground can be directly leveraged to yield locomotion across diverse terrains with SOTA sample efficiency (3M total samples).

For acquiring manipulation, we begin by training an agent to manipulate a small set of parametric geometries before extracting a synergistic action representation and training with these synergies on a much larger set of parametric geometries. As in the case of locomotion, training a policy without SAR directly on the larger set largely fails to yield meaningful dexterity.

Comparing performance on 100-object reorientation task. RL-E2E trains on thr 100-object task for 3M samples. RL+Curr (curriculum learning) and SAR-RL begin by training on an 8-object reorientation task for 1M samples. RL+Curr finetunes this base policy on the Reorient100 environment. Conversely, SAR-RL uses synergies extracted from this base policy to train on the Reorient100 environment. SAR-RL performs approximately 7x better than either baseline given the same total training budget (3M steps).

Although we use MyoSuite and musculoskeletal control as a testbed for assessing SAR, we were also able to successfully extrapolate the exact same method to other high-dimensional continuous control problems that do not use muscular control.

Extending SAR beyond musculoskeletal control

The first extension replicates the MyoLegs locomotion result in the full-body humanoid agent. The second extension replicates the MyoHand parametric geometry manipulation result in the robotic ShadowHand. These results demonstrate that while SAR may have a theoretical foundation in neuromotor control, it is by no means limited to this application.

Conclusion

We have introduced SAR as a method for handling the high-dimensional continuous control problem. Directly inspired by evolved strategies for human and animal motor control, the SAR method autonomously discovers and leverages a useful submanifold in the high-dimensional action space that efficiently facilitates significantly improved control. For more information on SAR, please visit the project site.