Chaos Engineering ROS2 Systems

About the history of ROS2 chaos engineering tooling.

Florian Kromer
Feb 1 · 4 min read
Image for post
Image for post
Photo by Tarik Haiga on Unsplash

In 2017/2018 I worked for Roboception GmbH, a company providing 3D vision hardware and software solutions for the robotics domain.

I was heavily involved in DevOps, the improvement of the software architecture, the improvement of the software development process and test automation. Besides other tasks I had one task was to implemented an update mechanisms for an embedded single board computer running U-Boot and Ubuntu included in probably the world’s first 3D sensor for robotic applications, called rc_vizard.

Image for post
Image for post
https://roboception.com/en/rc_visard-en/

When I joined there has been a prototyp of the rc_visard. The development of the core software components has been highly dynamic. This involved the setup of the scheduling of software component related functionalities, integrating components for e.g. gathering and processing sensor data from an IMU and two cameras, object detection, etc. A lot of these software components have been ROS1 nodes at that time.

In early stage system integration one thing you’ll usually be confronted with is that the systems sub-components do not play well with each other in the first place. The reason for this is often that the communication interfaces between sub-components (here ROS1 nodes) are “not matching”. Reasons for that can be that the interface definition is incomplete. This relates either to the “static” aspects of the interface like e.g. topic names, etc. or relates to “dynamic” aspects of the interface like e.g. transmission rate of messages on a topic. The later software component integration issue is usually the more harder one to identify. Sometimes everything is playing together just fine, sometimes the overall system crashes and you’ve no idea about the root cause. In summary early stage software integration can be extremly timeconsuming.

Anyway, we needed a way to speedup the software component integration process. In the web development domain Netflix Chaos Monkey has been a commonly known for chaos engineering.

Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

I decided to create a project with the aim of adapting chaos engineering to ROS1. hypothesis-ros was born. Before you “apply chaos” to production like systems you usually look at single components first in a similar manner. A common tool in the Python world is Hypothesis which is a property based testing tool. In contrast to “usual testing” where one expects specific outcomes for specific test inputs property based testing is about to test properties of software instead. This means that you create hypothesis about the software instead. The principle can be applied to all levels of a system: function, class, ROS1 node, etc. In case of ROS1 nodes one can e.g. make the hypothesis that a sensor data ROS1 node may never consume more than XYZ CPU or XYZ memory independent of what configuration and input I thrown at it. As soon as you “apply chaos” to a collection of components, here ROS1 nodes, you are doing chaos engineering with the root hypothesis that the system may not end up in a crashed system. An error state is undesired but does not result in a failed system in this context. I loved the tool and created a package for ROS2 called hypothesis-ros2 in addition.

Cause the functionality provided by hypothesis-ros/hypothesis-ros2 was sufficient I changed priorities to other problems. However the functionality was not generic and very basic. The methodology of property based testing is related to fuzzing. Alias Robotics beeing a company involved in pushing the state of the art in robot cyber security built upon the package design and implementation, ros1_fuzzer and ros2_fuzzer arose. Finally they commercialized the ROS2 chaos testing tool as part of alurity. alurity contains a lot of different tools for security engineering, including ROSCHAOS for chaos engineering in the security context. I’m really proud that I’ve been part of pushing robot cyber security to the next level by contributing design and implementation of the first ROS1 and ROS2 property based testing library.

Chaos engineering for ROS2 is powerful methodology to make the development and maintainance of ROS2 based systems w.r.t. system reliability more efficient. Property based testing can be seen as the foundation for chaos engineering applying chaos to a single ROS2 node only. Chaos engineering applies chaos to an overall system or a sub-system consisting of several ROS2 nodes. There is a commercially supported tool ROSCHAOS which is part of a robot cyber security tool suite nowadays.

The Startup

Get smarter at building your thing. Join The Startup’s +776K followers.

Florian Kromer

Written by

Software Developer for rapid prototype or high quality software with interest in distributed systems and high performance on premise server applications.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +776K followers.

Florian Kromer

Written by

Software Developer for rapid prototype or high quality software with interest in distributed systems and high performance on premise server applications.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +776K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store