ROS2 vs. ROS1— key differences and which one is better?

11 min readAug 1, 2022

If you like this story please consider liking, commenting, and following my profile, this will be a huge help for me and will support me in making new content.

ROS Refresher

If you haven’t already heard of ROS or don’t have much experience with it, here is a quick refresher so you could get caught up to speed. If you’re already a ROS veteran then you could just skip this section and move on to the ROS2 vs. ROS1 section.

ROS stands for Robot Operating System, in summary, ROS is a framework by which you could build scalable robotic applications for robots ranging from floor vacuumers to ones that go to space (NASA curiosity rover — link). The purpose of ROS is to decompose complex software into smaller and more manageable pieces. ROS simplifies the development process as it provides a unified interface for communication between different processes (or Nodes) on which different robot software components can be run. In addition, ROS has a large ecosystem of sensor, control, and algorithmic packages made available by community contributions, enabling a small team to build complex robotics applications fast.

To illustrate te utility of ROS, let’s consider a simple robot that just needs to navigate between any two points in space. In such a simple robot, there are two primary functions that the robot must perform in order to achieve this task: 1) Path planning (figuring out a path through the environment avoiding obstacles), and 2) Path tracking (following the planned path to move from point A to point B). If we want to use ROS to build this robot, then we could create two ROS nodes, one for path planning and one for path tracking. ROS nodes represent an independent process in the ROS stack, and they can communicate with each other using 3 primary modes:

Ros topics (publisher/subscriber): Enables nodes to broadcast (or publish) messages to any other nodes that choose to listen (or subscribe) to this topic.
Ros services (request/response): Enables one node (the client) to send a request to another node (the server) instructing it to perform a certain task. Once this task is complete, the server node will send a response to the client node carrying the result of the request.
Ros actions (action/feedback/result): Enables one node (the client) to send a request to another node (the action server) in order to perform a specific action, while the action is being performed, the server will constantly send feedback to the client with details about the progress of the action being performed. Once the action is completed, the server will send a result to the client with the final action outcome. ROS actions are similar to ROS services, except that the action server sends feedback, while the ROS service would not.

ROS also has a concept called the parameter server, on which global parameters could be stored and accessed by all nodes. For example

Here is a little diagram showing all the ROS components I just discussed.

Diagram illustrating basic components of ROS

So in our simple robot example, the path planner could run on one node and the path tracker will run on another node, and they could both communicate using a ROS topic and a ROS service. For example, the path generated from the path planner node can be published over a topic called “/path”, which our path tracker node will listen to and wait for a path to be published before getting the robot to move. Once, the tracker receives a path and starts moving and for any reason, a new path to be created (the robot has reached the goal, or the current path is blocked by an obstacle for example) it can communicate with the path planner node via a ROS Service called “/plan_path” where it would send a request for a new plan, and wait for a response saying planning started or planner couldn’t start (currently planning another path, or has an internal issue for example).

So one might ask, why did I choose to make the path planner and the path tracker run on two different nodes? Why can’t I make everything run on the same node and call it a day?

The answer is simple, running a path planner and path tracker on different nodes allows you to swap the planner or the tracker without impacting one another, and since the communication interface (ROS Topics and ROS Services) is standardized, then you won’t have to worry about redoing all the plumbing required to make your swapped component communicate with each other. This also opens the door for you to use multiple planners/trackers, each could be tailored to a specific task and all could again communicate over a common interface. In addition, you could re-use open source ROS planners and trackers created by the community, and you could easily plug them into your application.

ROS2 vs. ROS1

Ok, so we’ve talked a lot about ROS and how it is useful. Now let’s dive into ROS2, which is the new version of ROS that is better in many ways. Much of the information I will show here was extracted from the paper Robot Operating System 2: Design, architecture, and uses in the wild by Steven Macenski et al. So if you’d like more information, then I recommend you read the paper.

Let’s start with a history lesson, ROS1 was initially built and released by Willow Garage in 2007 to accelerate robotics research. Since then, ROS1 has gained a lot of popularity in the robotics community and it became the defacto platform for performing research and experimentation. However, ROS1 was not built with commercial use in mind, so things like security, network topology, and system uptime were not prioritized. So with ROS seeing adoption in the commercial space right now, a lot of its major flaws became more and more evident. Hence, there was a need to rebuild ROS from the ground up with commercial use in mind, i.e. ROS2.

ROS2 was built from the ground up with the following design requirements (extracted from the aforementioned paper) in order to make it ready for commercial use and adoption:

Security — It needs to be safe with proper encryption where needed
Embedded Systems — ROS2 needs to be able to run on embedded systems
Diverse networks — Need to be able to run and communicate across vast networks since robots from LAN to multi-satellite hops to accommodate the variety of environments where robots could operate and need to communicate.
Real-time computing — Need to be able to perform computation in realtime reliably since runtime efficiency is crucial in robotics
Product Readiness — Need to conform to relevant safety / industrial standards such that it ready for market

ROS2 and the desicion to use DDS

So to satisfy the above design requirements, ROS2 designers decided to switch to using DDS as the network protocol for all communication that happens internally in ROS2. DDS stands for the Data Distribution Service protocol and is widely used in critical infrastructures such as battleships, space systems, and others. It provides the security guarantees as well as the reliability needed to maintain good communication in areas with weak or lossy connections (wifi or satellite links). This is in contrast with the TCP/UDP custom protocol that was developed in ROS1 which missed a lot of the security and reliability guarantees provided by DDS.

So let’s go over the key differences between ROS2 and ROS1, and see how ROS2 has addressed the major ROS1 flaws.

ROS2 & DDS provide security guarantees, ROS1 doesn’t

First, to understand why security is a concern, you have to first consider that robotics applications can be very diverse, and in a lot of scenarios, robots might need to communicate over insecure networks whether it is via ROS topics, services, or actions. This is an enormous concern for critical commercial applications and there were lots of attempts in the community to address this concern for ROS1, but the solutions developed didn’t address the fundamental limitations in the design, as ROS1 was primarily built as a research tool

Robots might need to communicate over insecure networks such as the internet

The decision to switch to DDS in ROS2 made all the security concerns go away. DDS is an established and tested protocol that provides security guarantees that were otherwise not present in ROS1. So in that regard, commercial companies using ROS2 can have peace of mind if their robots have to communicate over insecure networks (such as the Internet).

ROS1 has a single point of failure (ROS master), ROS2 doesn’t

In ROS1, you have something called ROS Master, which provides naming and registration services for ROS nodes. To best understand how ROS master works, consider an example where node 1 is launched and has a publisher and a subscriber. The first thing node 1 will do is communicate with ROS master and say “Hey, here are the topics I’ll be publishing and subscribing to”, so if another node (let’s call it node 2) comes online and is publishing / subscribing to the same topics, tell them to communicate with me over a specific port. In that way, ROS master acts as the mediator for establishing connections between nodes. If node 2 ever comes up, it’ll do the same with ROS master, and a direct connection will be established between node 1 and node 2 over the designated ports without needing ROS master anymore.

Diagram illustrating the role ROS master plays in establishing communication between nodes

If ROS master dies, node 1 and node 2 will still keep communicating with each other, however, if a new node comes up and it attempts to communicate to either node 1 or node 2, then it will have no way of doing so because ROS master which is supposed to mediate the connection has died. This can result in orphaned nodes in your stack, that have no way of communicating with other nodes.

If ROS master dies, Node 3 can no longer communicate to Node1 and Node 2

In ROS2, ROS master was axed, and that was possible because ROS2 uses DDS which allowed for nodes to communicate with each other in a peer-to-peer fashion without the need for a mediator. Without getting into any details, you could just think that DDS uses magic which allows any node that comes up on the network to find another node without needing a mediator. This increases the fault tolerance of the system as you now don’t have a single point of failure.

ROS2 performs better than ROS1 in weak or lossy networking situations (like Wifi or satellite connections)

ROS1 performs well in reliable networking situations, as it is built using TCP protocol which is sufficient, however in unreliable networking situations, TCP/IP struggles to deliver reliable performance due to data re-transmits.

Since ROS2 uses DDS, no data re-transmits are needed so ROS2 performs better in unreliable networking situations.

ROS2 has multi-platform support

ROS1 is only supported on Linux, however, ROS2 is supported on Linux, Windows, and macOS. It is also easier to integrate ROS2 with cloud resources such as AWS.

ROS2 nodes can run on the same process and their lifecycle could be managed

In ROS1, ROS nodes were originally developed to run on a single process. Later on, ROS1 nodelets were introduced, which allow multiple nodes to run on the same process (each node is compiled as a shared library and loaded at runtime by a container process).

Running nodes on a single process is good if you’re doing memory-intensive operations, for example, copying images or point clouds from one node to another. As Nodes that run on the same process have access to shared memory which speeds up the communication between Nodes.

ROS2 nodes inherently work similarly to nodelets in ROS1, allowing nodes to run on the process hence reducing latency in communication between nodes. Hence, the concept of running nodes on the same process is deeply entrenched in ROS2 and is not a tertiary feature as in ROS1. In addition, unlike ROS1, the lifecycle of nodes in ROS2 could also be managed, for example, a node can model a state machine (with states like active, inactive, configured, misconfigured .. etc.), and it can be managed by an external system based on the active state of the system. For more about ROS2 node lifecycle management, I recommend checking the ROS documentation.

ROS2 client libraries share a common underlying implementation

In ROS1, each of the client libraries (such as roscpp, or rospy) were completely written in their own languages, which meant that for example, the underlying implementation of a publisher in rospy was different than roscpp, which introduced variation in the performance depending on which language is being used. In ROS2, all client libraries now share a common implementation written in C called rcl, which sits between the client libraries and the DDS interfaces required for communication. This makes each of the client libraries' implementation lighter, as it just wraps the common rcl implementation, which would generally provide more consistent performance across different languages as under the hood they all use rcl. This also eases the ability of developers to create new client libraries for new languages.

ROS2 and backward compatibility with ROS1

So one might say, ROS2 sounds too good, but I already have built my application in ROS1, and it’s simply not feasible to switch my entire application from ROS1 to ROS2.

If that’s a concern that you might have, then the ROS2 developers got your back, you could bridge your communication between your newly created ROS2 nodes to your existing ROS1 nodes using the ROS1 bridge. Hence, now you could re-factor your code and switch from ROS2 to ROS1 in bite-sized chunks without breaking the communication between the different parts of your stack.

Conclusion

ROS1 has helped power the robotics revolution and has provided much-needed utility in the field of robotics research, however, it had some major flaws that had to be addressed to make it suitable for commercial use. ROS2 is the new version of ROS which was built from the ground up with commercial use in mind, providing lots of enhancements in performance and security. The most critical change in ROS2 was perhaps the switch from TCP/UDP network protocol to DDS. DDS is an open-source network protocol that is used in critical infrastructure enabling ROS2 messages to be secured, also it paved the way for the elimination of ROS master which was a critical point of failure in ROS1. Some of the other new features introduced in ROS2 include multi-platform support (ROS now can run on Windows and macOs), and a common underlying implementation for client libraries which provides consistency in performance. If you decide to start using ROS2 for your robotic application, then you could easily make your newly created ROS2 nodes communicate with your old ROS1 nodes using the ROS1 bridge, so you don’t have to completely refactor all of your existing ROS1 nodes to start using ROS2.

References

Robot Operating System 2: Design, architecture, and uses in the wild by Steven Macenski et al

ROS Foxy Documentation

Ros2 vs. Ros1 by Gavin Suddrey