AV Part 3 — Engineering Low-Latency Peer Discovery for Autonomous Vehicles

Freedom Preetham
Autonomous Agents
Published in
9 min readDec 21, 2024

The ability to coordinate autonomous vehicle fleets within geo-fenced regions is a critical requirement for enabling next-generation transportation systems. Achieving this requires solving several deeply intertwined challenges, including reliable peer discovery, low-latency state synchronization, and geo-fence validation. The complexity is amplified by the need to balance decentralized vehicle operations with centralized control for tasks like dynamic geo-fence updates.

The central premise of this series is that relying on centralized servers for traffic coordination and fleet synchronization results in high latency, single points of failure, limited resilience, and increased infrastructure costs.

This blog is part of the Autonomous Vehicle series:

In this blog, I explain the engineering intricacies of building such a system, focusing on three core architectural layers:

  1. Peer Discovery Layer: Enables vehicles to discover and validate nearby peers dynamically in real time.
  2. State Synchronization Layer: Manages real-time exchange of vehicle states, ensuring consistency and scalability.
  3. Geo-Fence Management Layer: Ensures vehicles adhere to geo-fence boundaries while efficiently propagating boundary updates.

Each layer integrates carefully chosen tools, protocols, and mathematical techniques to ensure the system meets stringent requirements for resilience, scalability, and efficiency.

System Overview

The architecture is built around a layered design, where each layer is optimized to handle a distinct aspect of the system’s functionality:

  • Peer Discovery Layer: Facilitates decentralized registration, lookup, and liveness validation of peers.
  • State Synchronization Layer: Ensures real-time exchange of critical state information, such as position and velocity, with mechanisms for conflict resolution and convergence.
  • Geo-Fence Management Layer: Enforces strict adherence to geo-fence boundaries while managing dynamic updates.

The system carefully separates on-car components that handle real-time, low-latency tasks from server-side components that manage centralized control functions such as geo-fence updates.

Here is a high level component overview:

click image to expand

Here is a more detailed systems view of the same:

overall schematic (click to enlarge)
left half (click to enlarge)
right half (click to enlarge)

Do not worry if the text in the image above is not visible even after enlarging it. I have provided snapshots of each box further down that clarify the components and their relationships. Use this as a schematic only, as Medium has its limitations.

1. Peer Discovery Layer (On-Car)

The Peer Discovery Layer is the foundation of the architecture, enabling vehicles to dynamically locate nearby peers within a geo-fence. This layer must operate in a fully decentralized manner to eliminate single points of failure and support the real-time nature of autonomous vehicle operations.

click image to expand

Challenges Addressed

  1. Dynamic Network Topology: Vehicles join and leave the network unpredictably, requiring adaptive mechanisms for maintaining peer tables.
  2. Fault Tolerance: Node failures or unreliable network conditions must not disrupt peer discovery.
  3. Scalability: The system must scale to support large fleets without compromising performance.

On-Car Components

mDNS/DNS-SD for Local Discovery

  • Vehicles use Multicast DNS (mDNS) to advertise their presence within a local subnet.
  • DNS-SD (Service Discovery) extends mDNS by associating metadata (e.g., IP, position) with each vehicle.
  • Decentralized Operation: mDNS eliminates the need for centralized DNS servers, ensuring low-latency peer discovery.

Kademlia DHT for Global Lookup

  • Peer metadata is stored in a Kademlia Distributed Hash Table (DHT), a decentralized, fault-tolerant structure that supports efficient lookup with O(log⁡N) complexity.
  • Vehicles act as both clients (querying peers) and servers (storing peer data), ensuring scalability and fault tolerance.
  • Metadata registration is represented as:

Key Concepts of Kademlia DHT

  • Distributed Storage: The DHT is not centralized. Instead, it is distributed across all participating nodes (in this case, vehicles). Each vehicle contributes storage and computational resources to maintain parts of the DHT.
  • Metadata Storage: When a vehicle registers its metadata (e.g., position, velocity, IP address), this information is hashed into a unique identifier and stored on specific vehicles (nodes) in the DHT.
  • Efficient Lookup: The Kademlia DHT provides a mechanism for efficiently finding stored data. The O(log N) complexity means that finding a piece of data requires contacting only a logarithmic number of nodes relative to the total number of nodes in the network.

UDP Heartbeats for Liveness Validation

  • Vehicles periodically send UDP heartbeat messages to validate peer availability.
  • Heartbeat failures are used to dynamically update peer tables, ensuring that inactive nodes are evicted.

Server-Side Components

  • None: Peer discovery, registration, and liveness validation are entirely decentralized, requiring no server involvement.

Process Flow

click image to expand
  • Registration: When a vehicle enters the geo-fence, it registers its presence in the Kademlia DHT. The DHT replicates this metadata across multiple nodes to ensure resilience.
  • Peer Lookup: Vehicles query the DHT to discover peers satisfying latency constraints d_max:
  • Validation: Peer liveness is validated via periodic UDP heartbeat exchanges.

This fully decentralized approach ensures that peer discovery remains robust, scalable, and low-latency.

2. State Synchronization Layer (On Car)

Once peers are discovered, vehicles must exchange real-time information about their state, including position, velocity, and heading. The State Synchronization Layer focuses on optimizing bandwidth usage, resolving conflicts, and ensuring convergence of shared state information across peers.

click image to expand

Challenges Addressed

  1. Bandwidth Constraints: The network must accommodate frequent updates without saturating available bandwidth.
  2. Conflict Resolution: Inconsistencies due to asynchronous updates must be resolved deterministically.
  3. Low Latency: Updates must propagate quickly to ensure real-time situational awareness.

On-Car Components

Delta Compression for Bandwidth Efficiency

  • Vehicles transmit only the delta (change) between the current and previous states:
  • This reduces payload size significantly, especially in scenarios where only a subset of state variables changes.

Vector Clocks for Conflict Detection

  • Each vehicle maintains a vector clock VC to track the version history of updates.
  • Conflicting updates are detected and resolved based on the logical timestamps recorded in the vector clocks.

State Application with Convergence Guarantees

  • Vehicles apply received updates incrementally, using a smoothing factor α to avoid abrupt changes:
  • The projection operator ΠΩ​ ensures that the updated state remains valid within the geo-fence boundary.

Local State Cache

  • Vehicles maintain a local cache of their state and the state of their peers to enable rapid access during updates.

Server-Side Components

None: State synchronization is entirely decentralized, with vehicles directly exchanging state updates.

Process Flow

click image to expand
  • State Updates: Vehicles broadcast state deltas to peers at a fixed frequency f_sync.
  • Conflict Resolution: Vector clocks are compared to resolve inconsistencies, ensuring that the latest updates are applied.
  • Convergence: Updates are applied with smoothing and validation to ensure consistency and compliance with the geo-fence.

This decentralized approach allows vehicles to maintain real-time situational awareness without relying on centralized servers.

3. Geo-Fence Management Layer

The Geo-Fence Management Layer ensures vehicles operate strictly within defined boundaries while efficiently propagating updates to the geo-fence.

click image to expand

Challenges Addressed

  1. Local Boundary Enforcement: Vehicles must validate their positions locally to avoid dependence on servers.
  2. Dynamic Updates: Geo-fence boundaries may change in response to traffic conditions or operational requirements.
  3. Low Overhead: Updates must be incremental to minimize communication costs.

On-Car Components

Local Validation Engine

  • Vehicles validate their positions using the cached geo-fence boundary, represented as a convex polygon:
  • Position validation is performed locally with:

Incremental Update Engine

  • Vehicles receive geo-fence updates as JSON deltas, which are applied to the cached boundary representation.

Server-Side Components

Centralized Geo-Fence Updates

  • The server defines and updates geo-fence boundaries based on operational requirements.
  • Updates are distributed as incremental JSON deltas to minimize network overhead.

Dynamic Geo-Fence Adjustments

  • The server dynamically modifies boundaries (e.g., for traffic rerouting) and pushes the updates to vehicles.

Process Flow

click image to expand
  • Initialization: Vehicles cache the geo-fence definition upon entering the region.
  • Validation: Vehicles perform local checks to ensure their position remains within the boundary.
  • Updates: Geo-fence changes are propagated as incremental updates from the server.

This hybrid approach leverages centralized updates for global control while enabling local validation for real-time enforcement.

On-Car vs. Server Components

click image to expand

A Combined Scenario ~ Balancing Charging, Congestion, and Customer Demand

Assume that in a geo-fenced downtown area, an autonomous taxi fleet faces peak demand, limited charging stations, and heavy traffic congestion. Effective fleet management requires balancing these challenges to ensure seamless operations (Of course, all of this is done today on the centralized servers). In a decentralized use-case, each vehicle should dynamically share and retrieve metadata such as battery charge (SoC), customer demand (ride requests, pickup/drop-off locations), traffic conditions, and task intent (e.g., serving customers or heading to charge) through a decentralized DHT.

Please note that this is a “toy” scenario. In real world, there are constraints such as surge price, fraud, security, safety, rating, service type etc…

Real-Time Scenario:

Dynamic Context:
During a surge in ride requests near a stadium after an event, traffic congestion builds up, and many taxis approach critical low battery levels.

Metadata Query and Decision:
A taxi with low SoC (Car A) queries the DHT for nearby charging stations, traffic density, and ride requests. The response reveals:

  • The closest charging station is full, but another station 2 kilometers away will have an open slot in 10 minutes.
  • Several taxis with sufficient charge are en route to the stadium, capable of handling the immediate surge.

Decentralized Coordination:
Based on the metadata:

  • Car A skips the stadium pickup and prioritizes charging to ensure availability for future rides.
  • Taxis with adequate SoC (Car B and Car C) adjust their trajectories to serve the stadium surge efficiently, avoiding routes with high traffic density.
  • Idle taxis in less congested zones reposition to underserved areas, balancing fleet distribution across the geo-fence.

Mathematical Framework for Optimization:

This system can be modeled as a multi-objective optimization problem, balancing charging needs, congestion, and customer demand. The cost function:

Where:

  • T(x) represents the task time (e.g., charging, pickup, drop-off).
  • C(x) quantifies congestion cost based on traffic density.
  • U(x) measures the utility of meeting customer demand in underserved areas.
  • α,β,γ balance these competing priorities.

The optimization includes constraints for:

  • Minimum SoC required for task completion:
  • Congestion limits:
  • Efficient ride assignment: Minimize the distance between taxis and ride requests:

Using a Lagrangian multiplier λ, these constraints are incorporated into the overall objective, allowing decentralized gradient-based optimization.

Outcome:

This system ensures:

  • Taxis with low SoC recharge without disrupting service.
  • Customer demand spikes are handled efficiently without exacerbating congestion.
  • Fleet distribution remains balanced across the geo-fence, reducing idle times and optimizing resource utilization.

By leveraging real-time metadata and decentralized optimization, this approach enables autonomous taxi fleets to dynamically adapt to the complexities of urban operations while maintaining efficiency and reliability.

In the next blog in the series, I will explore the remaining critical components of the architecture: the Security and Trust Layer, Fault Tolerance Layer, and Edge Compute Nodes. These layers are essential for ensuring secure communication, system resilience, and real-time optimization, further enhancing the robustness and scalability of the autonomous fleet architecture. Stay tuned for a deep dive into these advanced engineering implementations.

--

--

Autonomous Agents
Autonomous Agents

Published in Autonomous Agents

Notes of Artificial Intelligence and Machine Learning.

Freedom Preetham
Freedom Preetham

Written by Freedom Preetham

AI Research | Math | Genomics | Quantum Physics

No responses yet