Building Multimodal Sensor Fusion for Real-Time Object Tracking
A Deep Dive into LiDAR, Radar, and the Unscented Kalman Filter (From Scratch in C++)
“Let me say quite categorically that there is no such thing as a fuzzy concept… We do talk about fuzzy things but they are not scientific concepts. Some people in the past have discovered certain interesting things, formulated their findings in a non-fuzzy way, and therefore we have progressed in science.”
— Rudolf E. Kalman (1972)
In the fast-evolving world of autonomous vehicles, accurate and safe navigation hinges on how well a car can perceive and understand its surroundings. After a decade of working on safety-critical systems at companies like Tesla, Cruise, and NASA, I’ve seen firsthand how essential robust perception is to real-world autonomy.
Modern self-driving systems rely on a sophisticated blend of sensors — each with distinct strengths and limitations — to build a coherent picture of the environment. Stereo cameras capture visual depth and textures. Radars penetrate through rain and fog. LiDARs generate precise 3D point clouds of the world around the vehicle. Together, these sensors contribute to perception, localization, and decision-making.
Let’s briefly explore the roles of each sensor.
Radar: Resilient, Velocity-Aware Sensing
Typically embedded in a car’s bumper, radars excel at measuring direct velocity using the Doppler effect — something cameras and LiDARs can’t do natively. This allows radars to converge on object motion faster and more accurately, making them ideal for tracking vehicles, detecting blind spots, and mapping under challenging weather conditions.
However, radar’s low spatial resolution can make it vulnerable to false positives. Highly reflective objects, like a soda can on the road, might be misclassified as larger hazards.
LiDAR: High-Resolution 3D Mapping
LiDAR uses pulsed infrared lasers to scan the environment and produce detailed 3D point clouds. Its vertical scan layers and rotating design provide rich spatial data, crucial for precise object detection and localization.
But LiDAR has limitations too. It can’t directly measure velocity, is more susceptible to dirt and weather, and its size can complicate integration into modern vehicle designs.
Why Sensor Fusion?
Each sensor offers a partial view of the world. Alone, none is enough. But fused together intelligently, they form a complete and reliable understanding of the driving environment.
This is where the Unscented Kalman Filter (UKF) comes in.
Unlike the standard Kalman Filter, which assumes linear relationships between state and measurements, the UKF is specifically designed for systems with nonlinear dynamics and measurement models with multiple sensor modalities — like those found in autonomous driving.
That said, the UKF combines radar’s polar measurements (range, angle, velocity) with LiDAR’s cartesian coordinates (x, y) much better compared to the vanilla or extended Kalman Filter. This is because instead of linearizing equations with derivatives, the UKF uses a set of carefully chosen sigma points to approximate the distribution of the state. These points are propagated through the system’s nonlinear functions, allowing the filter to maintain a more accurate estimate of uncertainty.
By combining LiDAR’s spatial precision with radar’s dynamic motion data, the UKF achieves faster convergence and more accurate object tracking — both critical for autonomous vehicles navigating in real-world traffic.
Let’s now dive into how to implement a UKF from scratch. To keep this tutorial accessible and focused, we’ll implement it using simulated LiDAR and radar measurements to estimate an object’s position and velocity in a simple 2D environment.
Keep in mind that real-world autonomous systems are far more complex — often integrating camera vision, advanced sensor fusion pipelines, and even vision-language models to achieve rich, semantic understanding of the world. But starting with this simplified case would give us a solid foundation for understanding how multimodal perception works and how it can scale.
So let’s dive right in, shall we?
Tutorial Overview
This tutorial is structured in three key parts. First, we initialize all necessary parameters to set up our system. Next, we implement the Unscented Kalman Filter to estimate the state, such as position and velocity of multiple cars on a highway, using noisy LiDAR and radar measurements. Finally, we evaluate the accuracy of our filter by computing the Root Mean Square Error (RMSE) against ground truth data. All the code used in this project is included and explained in the sections below.
To get the project up and running, I first adapted the original codebase to work with modern libraries and toolchains — updating the build system, resolving dependency issues, and ensuring compatibility with the latest versions of PCL and C++ compilers.
Environment Setup
We simulate a dynamic highway environment by building a straight, three-lane road featuring three traffic cars and one ego car positioned in the center lane. The entire scene is designed to follow the ego car (shown in green) while the surrounding traffic vehicles (shown in blue) move realistically by accelerating and changing lanes through steering inputs. The coordinate system is centered on the ego car, keeping all sensor data and tracking relative to its position. This implementation is defined inhighway.h
and run through main.cpp
.
Here, each traffic car is equipped with its own UKF instance. These filters are updated independently at every time step, allowing us to track the position and velocity of each vehicle in real-time as they navigate the road.
To visualize the sensor data, red spheres represent LiDAR detections (in the 2D x/y plane), while purple lines illustrate radar measurements, including both the angle and velocity magnitude of detected objects. Since we’re only tracking in two dimensions (x and y), the z-axis is ignored in this project. You can also check out tools.cpp
to change how these measurements are taken, for instance lidar markers could be the (x, y) center of bounding boxes by scanning the PCD environment and performing clustering.
Customize the Simulation
Thehighway.h
file offers several parameters you can adjust to control how the simulation behaves — especially useful for debugging or testing different configurations. Here’s a breakdown of the key ones:
trackCars
list determines which cars on the highway will be tracked using a UKF (Unscented Kalman Filter). Each value in the vector represents a traffic car (setting a value to false means that particular car won’t be tracked by the UKF):
std::vector<bool> trackCars = {true, true, true};
visualize_lidar
,visualize_radar
andvisualize_pcd
toggles control what kind of sensor data is visualized. Feel free to enable or disable each to focus on specific sensor types during simulation.
bool visualize_lidar = true; // Show LiDAR measurements (red spheres)
bool visualize_radar = true; // Show radar measurements (purple lines)
bool visualize_pcd = false; // Show raw LiDAR point cloud data
projectedTime
andprojectedSteps
controls the prediction horizon for visualizing the vehicle’s estimated future path using UKF. Increase these values if you want to see where the vehicle is headed based on its current state.
double projectedTime = 0; // Total time ahead to project (in seconds)
int projectedSteps = 0; // Number of future steps to predict
UKF Implementation
Instead of using mathematical approximations (like linearization), the UKF approach samples the uncertainty space with carefully chosen “sigma points”, runs each sample through the exact nonlinear motion model, and statistically combines the results to get the predicted mean and covariance. This gives much more accurate predictions for nonlinear systems compared to traditional Kalman filters.
In the following steps, we will learn that ProcessMeasurement
function takes in new sensor data (LIDAR or RADAR measurement and initiates thePrediction()
function. The Prediction
uses motion model to estimate where object should be at current time, and no sensor data involved — purely physics/motion-based prediction. Finally, it calls the appropriate update function (UpdateLidar
or UpdateRadar
). This is the complete measurement processing cycle.
The Kalman Filter Full Cycle
Think of UKF like predicting where a car will be:
1. We have a “best guess” of where the car is now (STATE)
2. We’re uncertain about this guess (COVARIANCE)
3. We predict where it will be next (PREDICTION)
4. We get sensor measurements and update our guess (UPDATE)
Similarly, the ProcessMeasurement()
function captures two primary steps: Prediction(dt)
("where should object be now?" using physics) and then UpdateLidar/Radar()
("correct prediction with sensor data"). This function implementation aligns well with the standard Kalman filter pattern:
- Predict: Use motion model to forecast state
- Update: Use sensor measurement to correct the forecast
UKF Parameters Initialization
To build our Unscented Kalman Filter (UKF), we start by setting up several core variables that define how the filter tracks and predicts object motion.
- State Vector (x_): Our “best guess” of 5 things:
[x_pos, y_pos, speed, heading, turn_rate]
- Covariance (P_): How uncertain we are about each guess
- Sigma Points: 15 sample points around our guess to handle curved motion
- Weights: How much to trust each sample point
Because UKF works by simulating multiple possible state outcomes (called sigma points), we use an augmented state vector with extra dimensions to account for process noise (uncertainties in acceleration and yaw acceleration). The size of this larger vector is stored in n_aug_, and we use it to create a matrix of predicted sigma points X_sig_pred_, which represents different possible outcomes of the motion model.
Each sigma point is weighted to contribute to the final prediction, and these weights are stored in the weights_
vector. These weights ensure that the predicted mean and covariance are correctly reconstructed from the set of sigma points.
Key Parameters Explained:
- n_x_ = 5: We track 5 things about the moving object
- n_aug_ = 7: Add 2 noise terms for prediction uncertainty
- lambda_: Controls how spread out sample points are, which influences the filter’s sensitivity to non-linearities in the system
- std_a_: How unpredictably acceleration might change — the standard deviation of longitudinal acceleration, representing unexpected speed changes (like sudden braking or acceleration)
- std_yawdd_: How unpredictably turning might change — the standard deviation of yaw acceleration, capturing unexpected changes in steering or orientation
Tuning Guide:
Too much position error? → Reduce std_a_
, std_yawdd_
Too much velocity error? → Increase std_a_
, or improve initialization
Filter too slow to adapt? → Increase process noise
Filter too jumpy? → Decrease process noise
Understanding Process Measurement
The UKF::ProcessMeasurement()
function is responsible for managing how the filter processes incoming data from both lidar and radar sensors. Its behavior changes depending on whether it’s handling the very first measurement or subsequent ones.
On the first measurement, the filter performs initialization. It sets up the state vector x_, which includes five key elements: position in x and y (px, py), velocity (v), yaw angle (yaw), and yaw rate (yawd). It also initializes the covariance matrix P_, which reflects the uncertainty in each state component, and records the initial timestamp time_us_.
The source of the measurement determines how initialization happens. For LiDAR, which gives direct Cartesian coordinates, we use the raw (x, y) position. For radar, which gives polar coordinates (range, angle, and range rate), we convert these to Cartesian coordinates before initializing the state.
Again — for all subsequent measurements, the filter proceeds through two main phases:
- Prediction: This step uses the motion model (without any sensor input) to estimate where the object should be at the current timestamp. The time difference (delta_t) is calculated from the current and previous timestamps, and converted from microseconds to seconds.
- Update: After prediction, the filter incorporates the actual sensor measurement to correct its forecast. Depending on the sensor type, it calls either UpdateLidar() or UpdateRadar().
This cycle — predict, then update — continues with every new sensor reading, creating a smooth and consistent estimation of the object’s motion over time. Lidar gives accurate positional data, while radar contributes velocity and angle information even without line of sight. Combining data from different modalities robustly tracks a moving object, especially in noisy or nonlinear environment.
Prediction Step
The main functionality of UKF::Prediction()
function is to predict where an object will be at the next time step, while properly accounting for uncertainty and nonlinear motion. Meaning, given the current state estimate and elapsed time (delta_t
), predict the object's future position, velocity, heading, and turn rate. The prediction process is the same for both Lidar and Radar measurements.
- Create an augmented mean vector
x_aug
and augmented state covariance matrixP_aug
- Generate sigma points matrix
Xsig_aug
for previously estimated state vector - Predict sigma points matrix
Xsig_pred_
for the current state vector - Predict the state mean
x_
and covarianceP_
using weights and predicted sigma points
These steps are required to handle nonlinear motion (unlike linear Kalman filters) such as objects that turn (curved trajectories). It doesn’t just predict a single “best guess” , but also how uncertain we should be about that prediction by tracking how uncertainty grows over time. The prediction step also accounts for random accelerations and disturbances we can’t predict exactly.
Updates on Lidar and Radar data
The steps to update Lidar and Radar measurements (in UKF::UpdateLidar()
and UKF::UpdateRadar()
) are similar, except Lidar points are in the Cartesian coordinates but Radar points are in the Polar coordinates. Therefore, they differ in the measurement dimension n_z
, dimension of matrices, and the transformation equations. Generally, they follow the same steps to update the measurement.
- Transform the predicted sigma points
Xsig_pred_
into measurement spaceZsig
based on the sensor types - Calculate the mean state
z_
and covariance matrixS
with noise considered - Calculate cross-correlation matrix
Tc
between state space and measurement space - Calculate the Kalman gain
K
- Update the state vector
x_
and covarianceP_
Sensor Fusion Evaluation: RMSE to predict accuracy
In object tracking applications — especially in autonomous driving — it’s crucial to not only know where an object is, but also how it’s moving. This is why we calculate RMSE (Root Mean Square Error) across four key state variables: x, y, vx, and vy. The variables x and y represent the object’s estimated position in 2D space, while vx and vy represent the velocity components along the x and y axes, respectively.
The simulation collects the position and velocity values outputted by the above algorithm and are now compared to the ground truth data. The px, py, vx, and vy RMSE are determined to be less than or equal to the values [0.30, 0.16, 0.95, 0.70] respectively, after the simulator has ran for longer than 1 second. The simulator will also display if RMSE values surpass the threshold.
The screenshot above is taken from our simulation frames. The ego car is green while the other traffic cars are blue. The red spheres above cars represent the (x,y)
lidar detection and the purple lines show the radar measurements with the velocity magnitude along the detected angle. The green spheres above cars represent the predicted path that cars would move in the near future.
Measuring accuracy in position (x, y) ensures the tracking system knows where each object is located on the road — this is essential for tasks like lane keeping, obstacle avoidance, and collision detection. On the other hand, evaluating velocity accuracy (vx, vy) tells us how fast and in what direction the object is moving. Even if the object’s current position is estimated accurately, a poor velocity estimate could lead to faulty future predictions. For instance, if the system mistakenly thinks a car is slowing down when it’s actually speeding up, it could make unsafe decisions in path planning or braking.
By computing RMSE across all four components, we gain a comprehensive understanding of how well the filter is performing. RMSE combines both bias and variance into a single score, offering a clear view of the average deviation between the estimated states and the ground truth data. This allows us to evaluate the overall performance of the Unscented Kalman Filter and ensure that the system is not only precise in the moment, but also reliable in predicting future states.
Concluding Thoughts
This project demonstrates the power of the Unscented Kalman Filter in bridging the gap between theoretical sensor fusion concepts and practical autonomous vehicle applications. By successfully fusing LiDAR and radar data to track multiple vehicles on a simulated highway, we’ve shown how the UKF’s sigma point approach elegantly handles the nonlinear dynamics inherent in real-world driving scenarios. The achievement of RMSE values within our target thresholds — 0.30, 0.16, 0.95, and 0.70 for position and velocity components respectively — validates that accurate multi-object tracking is not only possible but reliable enough for safety-critical applications.
The complementary nature of our sensor modalities proved essential to the system’s success. While LiDAR provided precise spatial awareness through its high-resolution 3D point clouds, radar contributed direct velocity measurements and environmental resilience that LiDAR alone cannot match. This sensor fusion approach highlights a fundamental principle in autonomous driving: no single sensor technology is sufficient for safe navigation. The UKF framework makes combining these diverse data sources both mathematically rigorous and computationally efficient, creating a robust perception system that performs well even in challenging conditions.
Perhaps most importantly, this implementation moves beyond academic exercise to demonstrate real-world applicability. The dynamic highway simulation, complete with lane changes, multiple vehicle interactions, and realistic motion patterns, shows that the UKF can handle the complexity of actual driving environments. The modular architecture allows for easy parameter tuning and extension to different scenarios, making it a solid foundation for production autonomous vehicle systems.
As we edge closer to fully autonomous vehicles, the UKF shows how powerful sensor fusion can turn noisy data into clear insights. By combining the strengths of LiDAR and radar, UKF offers a powerful way to interpret a complex, noisy world with precision and confidence. Hopefully, this walkthrough helped you demystify how UKF works and why it matters. And with this deeper understanding, I hope we all take one small step closer to building safer, more reliable autonomous systems for all.
Resources & Technical References
- Code Repository: moorissa/highway-kalman-filter — Complete implementation with build instructions and parameter configurations
- Kalman Filter Theory: Kalman and Bayesian Filters in Python by Roger Labbe — Comprehensive guide to understanding filter mathematics
- UKF Implementation: The Unscented Kalman Filter for Nonlinear Estimation by Wan & Van Der Merwe — Original paper introducing the sigma point approach
- Sensor Fusion: Probabilistic Robotics by Thrun, Burgard & Fox — Essential reading for multi-modal sensor integration
Development Tools
- PCL (Point Cloud Library): For LiDAR data processing and visualization
- Eigen3: Linear algebra library for matrix operations
- CMake: Build system for cross-platform compilation
Further Learning
- Udacity Self-Driving Car Nanodegree: Comprehensive program covering sensor fusion and autonomous vehicle systems
- MATLAB Sensor Fusion Toolbox: For prototyping and algorithm development
- ROS (Robot Operating System): Industry-standard framework for robotics applications including autonomous vehicles
If you found this post helpful, feel free to clap 50x or leave a comment — I’d love to hear your thoughts! If you like building more sensor related projects, here is a next article for you: