Intro — Why should you care about chaos engineering ROS2 systems?
In 2017/2018 I worked for Roboception GmbH, a company providing 3D vision hardware and software solutions for the robotics domain.
I was heavily involved in DevOps, the improvement of the software architecture, the improvement of the software development process and test automation. Besides other tasks I had one task was to implemented an update mechanisms for an embedded single board computer running U-Boot and Ubuntu included in probably the world’s first 3D sensor for robotic applications, called rc_vizard.
When I joined there has been a prototyp of the rc_visard. The development of the core software components has been highly dynamic. This involved the setup of the scheduling of software component related functionalities, integrating components for e.g. gathering and processing sensor data from an IMU and two cameras, object detection, etc. A lot of these software components have been ROS1 nodes at that time.
In early stage system integration one thing you’ll usually be confronted with is that the systems sub-components do not play well with each other in the first place. The reason for this is often that the communication interfaces between sub-components (here ROS1 nodes) are “not matching”. Reasons for that can be that the interface definition is incomplete. This relates either to the “static” aspects of the interface like e.g. topic names, etc. or relates to “dynamic” aspects of the interface like e.g. transmission rate of messages on a topic. The later software component integration issue is usually the more harder one to identify. Sometimes everything is playing together just fine, sometimes the overall system crashes and you’ve no idea about the root cause. In summary early stage software integration can be extremly timeconsuming.
What is chaos engineering?
Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.
I decided to create a project with the aim of adapting chaos engineering to ROS1. hypothesis-ros was born. Before you “apply chaos” to production like systems you usually look at single components first in a similar manner. A common tool in the Python world is Hypothesis which is a property based testing tool. In contrast to “usual testing” where one expects specific outcomes for specific test inputs property based testing is about to test properties of software instead. This means that you create hypothesis about the software instead. The principle can be applied to all levels of a system: function, class, ROS1 node, etc. In case of ROS1 nodes one can e.g. make the hypothesis that a sensor data ROS1 node may never consume more than XYZ CPU or XYZ memory independent of what configuration and input I thrown at it. As soon as you “apply chaos” to a collection of components, here ROS1 nodes, you are doing chaos engineering with the root hypothesis that the system may not end up in a crashed system. An error state is undesired but does not result in a failed system in this context. I loved the tool and created a package for ROS2 called hypothesis-ros2 in addition.
From chaos engineering to robot cyber security
Cause the functionality provided by hypothesis-ros/hypothesis-ros2 was sufficient I changed priorities to other problems. However the functionality was not generic and very basic. The methodology of property based testing is related to fuzzing. Alias Robotics beeing a company involved in pushing the state of the art in robot cyber security built upon the package design and implementation, ros1_fuzzer and ros2_fuzzer arose. Finally they commercialized the ROS2 chaos testing tool as part of alurity. alurity contains a lot of different tools for security engineering, including ROSCHAOS for chaos engineering in the security context. I’m really proud that I’ve been part of pushing robot cyber security to the next level by contributing design and implementation of the first ROS1 and ROS2 property based testing library.
Chaos engineering for ROS2 is powerful methodology to make the development and maintainance of ROS2 based systems w.r.t. system reliability more efficient. Property based testing can be seen as the foundation for chaos engineering applying chaos to a single ROS2 node only. Chaos engineering applies chaos to an overall system or a sub-system consisting of several ROS2 nodes. There is a commercially supported tool ROSCHAOS which is part of a robot cyber security tool suite nowadays.