Stitching Vehicle Software: Inter-process Communication

Divyansh Singh
Ather Engineering
Published in
6 min readFeb 3, 2023

The Ather 450X and Ather 450 Plus are smart scooters, the dashboard and the mobile apps together empower our vehicles to do more, be more. The dashboard might come off as a simple screen that displays information and responds to user inputs via touch or keys. However, there are multiple services that power the dashboard to elevate the entire ride experience, which is what we are constantly driving to improve. And how does it do that? The dashboard ensures that the rider’s inputs are processed, while connecting the Ather to the cloud, which means more space for all the math.

Inter-process communication: How to route data

Every Ather on the road produces thousands of messages per second, all of which are parsed, sent to the cloud, and then displayed on the dashboard (as required). Apart from that, our smart features such as the auto indicator-off use aspects of this data to produce more data as output. On an edge device, which the dashboard is, resources are limited. Our priority in the vehicle software world is to provide a seamless experience to the rider, which demands that all these simultaneous, complex functions take place smoothly, despite hardware challenges of edge devices.

All the data generated is either based on events such as change in battery state of charge, or user inputs such as change in the riding mode. Publisher-subscriber pattern is required to handle the data generated by events, and request-reply pattern is required to handle the requested data for any of our features.

This is where ZeroMQ comes in. ZeroMQ is a lightweight asynchronous messaging library which supports in-process and Transmission Control Protocol (TCP) communication. ZeroMQ has inbuilt support for publisher-subscriber patterns and request-reply patterns, however we had to write another layer over it to solve for three problem statements:

  1. The dynamic nature of our software
  2. Diversity of datatypes
  3. Preparing for inevitable errors

Let’s dive into the how and the why of our solutions.

1. Software is not constant

The software stack running in the vehicle is very dynamic. We improve it everyday, because we want our scooters to do more. New functionalities are added, existing functionalities are optimized and streamlined. The publisher-subscriber pattern requires an address to which services can publish to or subscribe from. If a new service is added, then existing services have to connect to a new address in order to subscribe or publish.

We solve this problem by creating a proxy using ZeroMQ. Every service in the vehicle publishes to this proxy and subscribes from this proxy, hence eliminating the need to add new addresses.

Figure 1: ZeroMQ Proxy

And this proxy performs incredibly well! We have run it through stress tests of up to 1 million messages per second, and it is able to handle this load at a latency of 3 seconds on an edge device. On faster hardware, we observed that this latency drops to as low as 0.6 seconds.

2. Different components use different datatypes

What if there is a need to add a new service that uses Apache Avro, while all the other services use JSON? This implementation of ZeroMQ helps keep the system agnostic to data structures. It takes bytes as input and produces bytes as output, this can then be converted to the required data structure by the receiving service.

3. Software faults are inevitable

There are a number of reasons why a service might crash. What if a message is sent to a service that is in a crashing or rebooting state? A messaging broker is required, but for an edge device using a heavy message broker like Kafka is not feasible.

This is why we have developed a messaging broker with ZeroMQ. This broker supports the following features:

Optimistic broker

Clients cannot estimate whether another service is running, or if the requested service is currently unavailable. In this case, the broker creates a synthetic instance and starts queuing messages up to a configurable limit. Since we are dealing with a vehicle, older messages are usually not relevant, which is why this queue is small — first in, last out — and expires quickly. We call the broker optimistic because it saves messages and waits for future communication from the clients.

The keen ones among you will have noticed that this approach depends on the assumption that the broker knows which services are connected to it, which inspired us to develop service discovery.

Figure 2: Optimistic Broker

Service discovery

The ZeroMQ broker maintains an expirable heartbeat with every service connected to it. Once the heartbeat for a service reaches its expiry, the instance of the service is deleted and a new heartbeat is sent, the acknowledgement of which is stored as a new connection for the service. To an extent, this ensures that the broker stays aware of the services that are connected to it at any given point.

This implementation has another benefit. We can maintain multiple instances for a service, which means that the broker can fire two messages at once if the service registers itself with two instances resulting in fast message transmission. Please note that this is not in parallel since the underlying socket connection remains the same, but we can route multiple requests at the same time.

Figure 3: Fast Request Response

Dynamic throttling

We know that hardware is scarce on an edge device. The broker has the potential to consume the entire memory and CPU bandwidth if total number messages sent exceed a certain threshold given the capability of hardware. To combat this, we have introduced dynamic throttling; this slows down the transmission of messages if some services produce more messages than the limit beyond which, the broker will start dropping packets of data instead of routing them to the recipient to ensure it does not eat up resources and lead to a crash in any part of the system.

C Worker API: client side library

The broker is smart, it can identify the clients that are connected to it, estimate data transmission speeds, and more. But what if it crashes? The clients import a C Worker API compiled for multiple frameworks, which is responsible for serializing and sending messages to the broker. This API also maintains a heartbeat with the broker. But, if the heartbeat expires and the broker does not respond, the API tries to initiate a new connection assuming that the broker is in a crashing state and will soon reboot.

No acknowledgement or NACK

Not all requests will be replied to, the broker is designed to drop packets if it is unable to route them. But, what if a request is time critical? Enter humanity’s best invention: clocks. The C Worker API has a provision to timeout a request and return a NACK in case no response is received.

Asynchronous request response

There are scenarios in the dashboard where a request is made and the response is not required immediately. Why wait for a reply and waste time though? This is why the C Worker API at the client end is asynchronous; it keeps sending requests and receiving replies at the same time. Replies can be mapped to their original request via a request ID. Each client is designed to use the replies according to the scenario, and decides whether to wait for a reply or not.

What’s next?

We developed this messaging library with the intention of running it on an edge device with limited hardware resources, however after stress testing this solution we have learnt that this library can be used on the cloud as well, because it scales up to match the number of vehicles we are putting on the road everyday.

Cloud services that publish data after processing it, can use this ZeroMQ proxy for publishing and the same goes for request response. In addition to the features mentioned above, more intricate procedures such as data retention can also be implemented. Since this is an in house library built from scratch, there is infinite potential for customization on the basis of our ever-expanding cloud needs.

Special mention to architect Abhilash Gopalakrishna, and editor Ardhra S.

P.S. If you’d like to work on these next generation of problems, check out our careers page to find your fit: careers.atherenergy.com

--

--