We trained a controller for tendon-driven robots using Reinforcement Learning (RL), more specifically, with Proximal Policy Optimization. Our code is available on Github. The foundation is set for spectacular follow-up RL research within the Roboy Robotics Lab!
The Roboy Lab
During the Winter Semester 2018/19, we had the opportunity to contribute to the Roboy Robotics Lab. Roboy 2.0 is a robot that provides an excellent testbed for challenging robotic control scenarios, because its locomotion is driven mainly by tendon actuators. Coordinating the tendon actuators into smooth whole-body movements remains an unsolved problem.
We started out small, and decided to use a single joint of the Robot as our experimentation testbed. The Roboy team had this joint available in simulation. We would take advantage of this simulator to train a controller and would then transfer the controller to the hardware.
Choice of RL Algorithm
We focused our research on algorithms that have been deployed on hardware and that were able to take advantage of a simulation to gather experience. We decided to perform our experiments with an algorithm published by OpenAI called Proximal Policy Optimization (PPO). In their publication, they also included two reference implementations on Github. Read more about this algorithm in their blog post.
These implementations were forked by a group of scientists outside of OpenAI and improved in terms of reusability. The new repository, called Stable Baselines, has the implementation of PPO that we used to train our RL agent.
Communication Channel: ROS
The Roboy 2.0 project relies for its communication in great part on the Robot Operating System (ROS) libraries set. ROS1, however, is mostly compatible with Python 2, and our RL library, Stable Baselines, is only compatible with Python 3. We decided on communicating both sides with using the ros1_bridge package of ROS2, which you can check here.
Training on the Cloud
We decided to perform our RL training workloads on the Google Cloud mainly for flexibility reasons. We ran the training experiments on VMs with Docker installed on them. Since our simulation has not been fully parallelized yet, the workloads can be tackled with CPU-only VMs, greatly reducing the costs per hour for each VM.
We trained the controller to reach with the joint towards a target position. In the video below on the left is the hardware simulation. There, the target position is the green shadow. On the right, we see the actual hardware being actuated by our controller.
This was the result of 6 hours of training. The controller manages to reach most of the goals, although its movements are sometimes jerky.
There are many exciting research directions that can build upon this proof-of-concept. Some of them are:
- Using Model-based Reinforcement Learning.
- Scaling up to the full Roboy 2.0 robot.
- Scaling training through parallelization.
All of them are very promising and can deliver spectacular applied research results! We have done the most engineering-heavy part by building up the infrastructure in Docker, ROS, etc. to support more science-heavy RL research. Now it is up to you! Clone our repo and start experimenting!
We performed this project as part of the Roboy Robotics Lab. Our team was composed of Baris Yazici, Alexander Pakakis and me, Tomas Ruiz, all students at the TUM. We were closely supported by Alona Kharchenko, doctoral student at the Chair of Robotics, Artificial Intelligence and Real-time Systems at the Technical University of Munich (TUM), and Simon Trendel, part of the Roboy Robotics Lab.