Untethered Stereo Vision Answers The Demands Of Mainstream Autonomy
The era of autonomy is upon us and at NODAR, we’re excited. The promise of a near-0% automobile accident rate, reduced traffic, improved warehouse efficiency, 10-minute unmanned air taxi commutes to work, and 24/7 autonomous farming are just a few great reasons.
However, regardless of the speed at which autonomous vehicles and robots are embraced by society or the carrot of trillions in revenues, the reality is there are still considerable technical, legal, and ethical challenges yet to be solved before a car will take you to your work without human involvement, or before a harvester can work a wheat field on its own.
One technical challenge at the core of all autonomy is 3D vision. Every autonomous vehicle, whether car, truck, forklift, or factory robot, must have the ability to sense its surroundings. No doubt there are many ways to skin the 3D vision “cat”, but one point everyone can agree on is in almost all mainstream use-cases, safety, performance, and cost are of paramount concern.
At NODAR, we are commercializing a 3D vision system that promises not only to provide unparalleled performance and safety but to do so at a price point that is significantly lower than competitive technologies.
We accomplish this with a novel approach to an age-old concept — stereo vision — and have attributed the name “untethered stereo vision” to describe the approach. We say “untethered” because, through the use of patent-pending software algorithms, we have unshackled the stereo cameras from the historical requirement that they are bound by rigid structures to ensure alignment. By freeing the stereo camera pair from mechanical linkage, we are able to both extend the distance between the cameras and thus the range they can “see”, and allow them to be mounted virtually anywhere — placed meters apart, arranged horizontally, arranged vertically, or pointed in different directions (as long as they have overlapping fields-of-view), for example.
Stereo vision cameras obtain dense 3D information about the world by triangulation — much like binocular human vision. Untethering the stereo camera pair may seem like a simple change at first glance, but it has profound benefits:
- Long-range operation out to hundreds of meters. Cameras can be placed far apart, which increases the sensor’s ability to see far. The longer the baseline length (i.e., the distance between the left and right cameras), the longer the range. For example, a target at 1.8-km range has a measurable disparity of 1 pixel with 1.5-m baseline length (approximate width of an automobile), 60-deg field-of-view lens, and imager with 1280 horizontal pixels. Standard stereo vision baseline lengths are limited by the stability of the mechanical stiffener between the stereo camera pair to approximately 20 cm, and hence have limited range.
- Placement of cameras on non-rigid structures allows the designer to improve the aesthetics of the product and the engineer to mount the cameras for the optimal vantage for perception.
- Use of virtually any camera (pixel format, RGB, NIR, LWIR, sensor size, pixel size, frame rate, sensitivity, dynamic range, etc.). This offers a wide selection of manufacturers for performance and cost optimization. There are thousands of commercially-available camera models but only tens of stereo vision camera models.
- Ride the CMOS cost curve — as opposed to the ultra-expensive high bandwidth electronics required in time-of-flight systems like LiDAR, NODAR utilizes off-the-shelf CMOS cameras that are low-cost today and match the photon sensitivity of the human eye, and of which 7 billion (yes, billion) will ship this year at an average cost of $2.37/unit.
How does it work?
NODAR’s untethered stereo vision technology obtains reliable distance measurements to every object in the scene using standard CMOS cameras and automotive System on a Chip (SoC) processors. NODAR software continuously calibrates the cameras — even in the presence of road shock and vibration, temperature changes, material fatigue, or deformations from collisions — so that the cameras can be placed virtually anywhere on the car.
Old-school stereo vision products are typically constructed on a single printed circuit board (PCB) that contains the left and right CMOS imagers (cameras), processor, and interface electronics. Often, the PCB and image sensors are fastened to a single frame to prevent undesired deviation which would create miscalibration. For example, the image below shows an Intel Realsense D435i stereo vision camera. Even though the baseline length is only 55 mm, great effort was spent in making sure that the left and right cameras do not experience any relative orientation or translational shifts. There are multiple structures holding the cameras in place: the outer aluminum shell provides a rigid package, the inner diecast aluminum structure is a whopping 13-mm thick, the outer sheet metal structure is 8.5-mm thick to further stiffen the mount, and the PCB holding the cameras is itself printed on a hard aluminum substrate. All of this effort for only a 55-mm baseline length!
Traditional stereo vision systems are in use by major OEMs in production vehicles today. LandRover (stereo vision supplied by Bosch) and Subaru EyeSight (supplied by Hitachi) are examples of short-baseline, short-range stereo vision systems constrained by rigid I-Beam mounting between cameras.
At NODAR, we realized that increasing the baseline length by a factor of 27 greater than that of the Realsense camera, to 1.5 meters could not be done mechanically while maintaining alignment for 15 years (nominal lifetime of a car) no matter the temperature, and no matter the vibrational environment.
Instead, we turned a mechanical engineering problem into a software problem. Rather than building a huge I-beam between the cameras to force them into position (an untenable requirement for most industries), we simply allow the cameras to move and use information from the video feed to track the relative camera positions in real-time. The relative displacements of the cameras are compensated in software. The result is beautiful depth maps in all conditions for all time with ranges of hundreds of meters — beyond anything that could be done mechanically.
The lack of unstructured stereo vision systems has forced many autonomous vehicle companies and universities to build their own custom stereo vision rigs. These are the cameras that you might see bolted to 80/20 extruded aluminum frames on the roof of the car or truck, typically with 0.5-m baseline lengths. Teams of engineers spend countless hours calibrating the stereo vision sensors, only needing to repeat the calibration again when the car hits a bump or when the temperature changes (thus warping the mechanical mounts). When the relative orientation of the cameras needs to be known to 0.001–0.01 degrees, even thermally-induced mechanical changes are enough to ruin the alignment of a stereo camera pair! These homebrew systems are fine for research and development but are not sufficient for production — unless you ship the calibration engineer with the car…
Demonstration
To demonstrate our technology, we applied our autocalibration algorithms to the Ford Autonomous Vehicle Dataset, a dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017–2018 that were manually driven on a route in Michigan that included the Detroit Airport, freeways, city-centers, university campus, and suburban neighborhood. The dataset was released in March 2020.
The video clip below (1 min 11 sec) shows a sample from a section of freeway, construction, tunnel, airport, and partly cloudy conditions. The top frame is the front left camera (the front right camera is also used but not shown in the video to reduce clutter), the middle frame is the depth map constructed from Ford’s calibration parameters, and the bottom frame is the depth map constructed from NODAR’s autocalibration algorithm running in real-time. The distance in the depth map is encoded as a color, where blue colors indicate objects that are close and red colors indicate objects that are farther away. For example, red corresponds to 100-m range and brown corresponds to 700-m range.
Both the middle and bottom frames were run with the same variant of the semi-global block matching algorithm, but the NODAR-processed depth map was able to more accurately recover the stereo calibration parameters from live scenes with no need for calibration targets. Surprisingly, the Ford data was not calibrated well, presumably due to using stale calibration values from a previous day, which illustrates the fundamental problem with standard stereo vision approaches. Furthermore, the NODAR depth map looks much cleaner because of our pre-and post-processing depth map filters, such as a sky filter, which removes regions that are confusing for the semi-global block matching algorithm.
Conclusion
At NODAR, we can see our collective autonomous future as clear as day (pun intended!) and are working hard to shorten the timeline to adoption for everyone. We believe that driver-assisted and fully autonomous vehicles should and will be significantly safer than human-driven vehicles and that NODAR’s camera array-based approach will become the cornerstone to enabling this vision.
Whether you work on consumer, commercial, or industrial vehicles, if you are evaluating 3D vision systems, we would love to hear from you at contact@nodarsensor.com.
We are also hiring! Please reach out at hiring@nodarsensor.com with your resume or LinkedIn.
And last, please check us out at www.nodarsensor.com.
Written by Dr. Leaf Jiang, CEO/Founder, NODAR