Visual depth estimation by two different sensors

Stereo disparity map and point cloud from Passive and Active vision low-cost sensors

Published in

Analytics Vidhya

9 min readJul 25, 2020

Hello There! Here I’ll show an experiment with two low cost sensors for depth estimation. One is a Passive Vision sensor, the HP Webcam HD-4110, which is used for our Stereo Vision approach. The other is the Active Vision sensor, a LIDAR Lite V2, which measures distance by illuminating a target with a laser light.

The original paper which this post is based is available on ResearchGate [1]. The GitHub repository with the code used for this experiment is available in:

thiagortk/Low-cost-Passive-camera-and-Active-LIDAR-Vision-for-Obstacle-Depth

Example of disparity map created by stereo correspondence through two images and a point cloud map created by LIDAR…

github.com

Passive and Active Vision

In Passive Vision, the sensor does not need to send a signal to the environment. Cameras for example, which capture the image only by opening and exposing the lens to the ambient light. In this case, to have a depth estimate, we need at least two cameras. Active Vision, on the other hand, emits a signal to the environment, through a source of energy emission, as is the case with lasers.

gif from https://giphy.com/

Both sensors captured the same scene at the same time, with the LIDAR positioned between the two cameras.

LIDAR depth estimation

To ensure that LIDAR scans the entire scene, it cannot be pointing to just a certain region, for this purpose two Micro Servo SG90 were used. One was responsible for horizontal scanning and the other for vertical scanning.

To get the LIDAR depth measurements values and to control both Micro Servos, an Arduino Uno is used. First of all, let’s put some includes and constants definitions.

/Passive and Active Obstacle Depth/lidarControl.ino

Now let’s create the objects for each Micro Servo and the variables to get the values of position for each.

/Passive and Active Obstacle Depth/lidarControl.ino

In the setup() function some variables and the position where each servo is connected are initialized. This function is performed only once when the Arduino is turned on or when it is reset.

/Passive and Active Obstacle Depth/lidarControl.ino

Next, in the loop() function, the movement of both Micro Servos occurs and the function to collect LIDAR values is called for each movement. This function does exactly what the name suggests and is repeated over and over while the Arduino is on.

/Passive and Active Obstacle Depth/lidarControl.ino

The function GetDist() called within the loop() function in every Micro Servo position changing is responsible for the LIDAR depth estimate.

/Passive and Active Obstacle Depth/lidarControl.ino

With the values of the LIDAR written in a serial file a point cloud image could be created with the use of Processing. As in Arduino, the function setup() also runs once when the program starts, here we made our initial environments definitions.

/Passive and Active Obstacle Depth/lidarPointCloud.pde

The draw() function occurs right after the setup() and runs until the program is stopped. The screen is updated after the draw() function. The stroke() function is used to set the color of our drawings with RGB being the default color space. The point() function draws a point in a specific coordinate (xVal, yVal). Finally, the saveFrame() does as the name suggests. 🙃

/Passive and Active Obstacle Depth/lidarPointCloud.pde

In the serialEvent() the data captured by the LIDAR is read, enabling us to compute the values as grey tones for our point cloud disparity map.

/Passive and Active Obstacle Depth/lidarPointCloud.pde

The result image is similar to a disparity map, with dark gray tones representing more distant points and light gray tones representing nearby points.

Original image and LIDAR point cloud [1]

Camera depth estimation

Now, let’s talk about the Stereo Vision process, here we made use of the OpenCV library, including a post-filtering available in the contrib modules.

If you already have a calibrated pair of cameras or images already rectified you can jump the calibration and rectification steps and go direct to the disparity map calculation.

Calibration

But, the cameras we used here are two separated single cameras, and we need to setup our stereo pair. First of all, we try to put this cameras physically align horizontally and vertically. Followed by a calibration step which is necessary so we can estimate the spatial relations between these two cameras and focal point, center of the image, rotation and translation. All these data will be used so we can rectify our images (one from each camera) and put them in the same y-axis, the same plan. Why we need this you can ask? This will make our lives easier when calculating the Disparity Map, looking for the corresponding points between the images of each camera, calculating only based on the x-axis. A known geometric pattern like a chess board is used in the calibration step.

We must not forget to include the calibr3d to use OpenCV functions for our calibration and rectification.

/Passive and Active Obstacle Depth/calibration.h

We take 13 chess board images from each camera with the chess board in different positions. Using the function findChessboardCorners() from OpenCV we get the borders cornes and save them to use in the calibration.

/Passive and Active Obstacle Depth/calibration.cpp

Next we use the stereoCalibrate()to do the calibration step. Where CM1 and cm2 are 3x3 floating point arrays of each camera, D1 and D2 are distortion coefficients vectors of each camera, R is the rotation matrix between the first and second camera coordinate systems, T is the translation vector between the cameras coordinate systems, E is the essential matrix and F is the fundamental matrix.

/Passive and Active Obstacle Depth/calibration.cpp

Rectification

After the calibration we can use the stereoRectify() OpenCV function. Where R1 is the 3x3 Rectification Transformation (rotation matrix) for the first Camera and R2 for the second camera. p1 is Projection matrix 3x4 in the new and rectified coordinate system of the first camera and p2 for the second camera. Q is the disparity matrix by depth 4x4. The initUndistortRectifyMap() is called for images from both camera, so two times, here are computed the undistortion and the rectification transforms. Next we can use the remap() function to do the geometrical transformation in the images.

/Passive and Active Obstacle Depth/calibration.cpp

Disparity Map

What is disparity? If we take a point in the real world, we’ll have a pixel in each image (from each camera capture) representing this point, so the disparity is the difference that this pixel have in the coordinate position between these two images. As our images are rectified, so just the difference in the x-axis. 😎

Here we need to take care to include the post-filtering available in the contrib modules, /ximgproc/disparity_filter. Also the remaining here is based as the example from the post-filter.

/Passive and Active Obstacle Depth/disparity.h

Let’s declare some variables and prepare some stuffs that we will use later. Take care when using VideoCapture, because if you already have an internal camera on your machine and are trying to use two external cameras, you need to set as VideoCapture(1) and VideoCapture(2). The option VideoCapture(0) is for the internal camera.

/Passive and Active Obstacle Depth/disparity.cpp

Next we define some values for our stereo correspondence.

/Passive and Active Obstacle Depth/disparity.cpp

Then we continue to follow the excellent material from the OpenCV disparity filter, which says:

The filter instance is created by providing the instance of the StereoMatcher. Another instance is returned by createRightMatcher. These two instances are used to calculate the Disparity Maps for the right and left images, this is necessary for filtering afterwards.

Here we’ve used the SGBM algorithm.

/Passive and Active Obstacle Depth/disparity.cpp

Disparity Maps calculated by the respective match instances, just as the left image is passed to the filter. Note that we are using the original image to guide the filtering process.

/Passive and Active Obstacle Depth/disparity.cpp

As in the LIDAR result, here we show the light gray values for objects closer to the cameras and dark gray values for distant objects.

Original image and Stereo Disparity Map [1]

Results

In the next image, both results are shown, disparity map (camera) in column B and point cloud (LIDAR) in column C. Each line (from 1 to 5) with the person increasingly distant from the sensors.

As you can see, for a better differentiation, a new coloring was applied to the Disparity Map.

/Passive and Active Obstacle Depth/disparity.cpp

More about the color maps options in OpenCV documentation.

Discussion

Both sensors used here, would have problems in measuring depth in outdoor environments. The LIDAR, because it is really simple, can be convenient for small projects where it is necessary to point to just the same location or where there is not much movement, as it is quite dependent on the servo motors for movement.

These cameras will also suffer from the issue of movement. They can be very useful indoors and with little movement. But in an environment where there is a lot and fast movements (fast and furious motion… ha!), in addition to constant change of lighting, the sensors of the lower cost cameras will not be useful for Stereo Vision. Due to the Rolling Shutter process, which is how the camera lens works, more common in low-cost, webcams. The Rolling Shutter does not take a snapshot of the entire scene in a single instant, but in horizontal or vertical sequentially steps. This can make a gelatinous effect or other types of distortions. Cameras with the Global Shutter sensor are more suitable in this case, since a Global Shutter captures the entire scene at once, avoiding unwanted distortions, which can compromise the Disparity Map calculation.

Deeper Discussion

These experiments were carried out in order to evaluate the possibility of using low cost sensors in the application in intelligent vehicles. And maybe, try an application only with low cost cameras, avoiding LIDAR. One of our concerns was regarding the use of LIDAR, as there are materials that classify LIDAR as a Class 1 laser, being the safest for the human eye [2]. But it is also this material [3] that places lasers with a wavelength between 780nm and 1400nm, which is the range of the LIDAR, which can cause cataracts and burn the retina when in constant contact with the human eye. 😱

gif from https://giphy.com/

Obviously, today, with few vehicles using LIDAR on the streets, just in testing, this is not a concern. But in a future where autonomous vehicles are a reality, a large number of vehicles dipping lasers can be relevant. Imagine, for example, a traffic jam scenario on a large avenue. Nearby pedestrians can suffer from this. 🤔

It should also be taken into account that LIDAR can provide more reliable responses when detecting an obstacle and possibly avoiding an accident, where a camera may fail for some reason related to lighting, for example: a solar flare preventing the scene from being identified as a mostly, very light/white objects. 😯

Well, I hope you liked this post and see you next time. 🤘

Acknowledgements

This experiment is part of a project on visual perception for vehicular navigation from LAPiX (Image Processing and Computer Graphics Lab).

If you are going to talk about this approach, please cite as:

@article{chiarella:2020,
author = {Vito F. Chiarella, Thiago Rateke, Karla A. Justen, Antonio C Sobieranski, Sylvio L Mantelli, Eros Comunello, Aldo von Wangenheim},
title = {Comparison between low-cost passive and active vision for obstacle depth},
journal = {Revista de Ciência e Tecnologia (RCT)},
volume = {6},
year = {2020},
}

References

[1] Chiarella, V. F.; Rateke, T.; Justen, K. A.; Sobieranski, A. C.; Mantelli, S. L.; Comunello, E.; von Wangenheim, A. Comparison between low-cost passive and active vision for obstacle depth, (2020), Revista de Ciência e Tecnologia (RCT).

[2] American National Standard for Safe use of Lasers Outdoors, (2005).

[3] International Electrotechnical Commission. SAFETY OF LASER PRODUCTS. Part 1: Equipment classification, requirements and user’s guide, (2001), Group Safety Publication. International Standard.

Visual depth estimation by two different sensors

Stereo disparity map and point cloud from Passive and Active vision low-cost sensors

thiagortk/Low-cost-Passive-camera-and-Active-LIDAR-Vision-for-Obstacle-Depth

Example of disparity map created by stereo correspondence through two images and a point cloud map created by LIDAR…

LIDAR depth estimation

Camera depth estimation

Calibration

Rectification

Disparity Map

Results

Discussion

Acknowledgements

References

See Also

Road Surface Semantic Segmentation

Detecting potholes, water-puddles, different types of terrain and more

Road Surface Classification

An approach for road surface type and quality classification

Published in Analytics Vidhya

Written by Thiago Rateke