Visual depth estimation by two different sensors
Stereo disparity map and point cloud from Passive and Active vision low-cost sensors
Hello There! Here I’ll show an experiment with two low cost sensors for depth estimation. One is a Passive Vision sensor, the HP Webcam HD-4110, which is used for our Stereo Vision approach. The other is the Active Vision sensor, a LIDAR Lite V2, which measures distance by illuminating a target with a laser light.
The original paper which this post is based is available on ResearchGate [1]. The GitHub repository with the code used for this experiment is available in:
Passive and Active Vision
In Passive Vision, the sensor does not need to send a signal to the environment. Cameras for example, which capture the image only by opening and exposing the lens to the ambient light. In this case, to have a depth estimate, we need at least two cameras. Active Vision, on the other hand, emits a signal to the environment, through a source of energy emission, as is the case with lasers.
Both sensors captured the same scene at the same time, with the LIDAR positioned between the two cameras.
LIDAR depth estimation
To ensure that LIDAR scans the entire scene, it cannot be pointing to just a certain region, for this purpose two Micro Servo SG90 were used. One was responsible for horizontal scanning and the other for vertical scanning.
To get the LIDAR depth measurements values and to control both Micro Servos, an Arduino Uno is used. First of all, let’s put some includes and constants definitions.
Now let’s create the objects for each Micro Servo and the variables to get the values of position for each.
In the setup()
function some variables and the position where each servo is connected are initialized. This function is performed only once when the Arduino is turned on or when it is reset.
Next, in the loop()
function, the movement of both Micro Servos occurs and the function to collect LIDAR values is called for each movement. This function does exactly what the name suggests and is repeated over and over while the Arduino is on.
The function GetDist()
called within the loop()
function in every Micro Servo position changing is responsible for the LIDAR depth estimate.
With the values of the LIDAR written in a serial file a point cloud image could be created with the use of Processing. As in Arduino, the function setup()
also runs once when the program starts, here we made our initial environments definitions.
The draw()
function occurs right after the setup()
and runs until the program is stopped. The screen is updated after the draw()
function. The stroke()
function is used to set the color of our drawings with RGB being the default color space. The point()
function draws a point in a specific coordinate (xVal, yVal)
. Finally, the saveFrame()
does as the name suggests. 🙃
In the serialEvent()
the data captured by the LIDAR is read, enabling us to compute the values as grey tones for our point cloud disparity map.
The result image is similar to a disparity map, with dark gray tones representing more distant points and light gray tones representing nearby points.
Camera depth estimation
Now, let’s talk about the Stereo Vision process, here we made use of the OpenCV library, including a post-filtering available in the contrib modules.
If you already have a calibrated pair of cameras or images already rectified you can jump the calibration and rectification steps and go direct to the disparity map calculation.
Calibration
But, the cameras we used here are two separated single cameras, and we need to setup our stereo pair. First of all, we try to put this cameras physically align horizontally and vertically. Followed by a calibration step which is necessary so we can estimate the spatial relations between these two cameras and focal point, center of the image, rotation and translation. All these data will be used so we can rectify our images (one from each camera) and put them in the same y-axis, the same plan. Why we need this you can ask? This will make our lives easier when calculating the Disparity Map, looking for the corresponding points between the images of each camera, calculating only based on the x-axis. A known geometric pattern like a chess board is used in the calibration step.
We must not forget to include the calibr3d
to use OpenCV functions for our calibration and rectification.
We take 13 chess board images from each camera with the chess board in different positions. Using the function findChessboardCorners()
from OpenCV we get the borders cornes and save them to use in the calibration.
Next we use the stereoCalibrate()
to do the calibration step. Where CM1
and cm2
are 3x3 floating point arrays of each camera, D1
and D2
are distortion coefficients vectors of each camera, R
is the rotation matrix between the first and second camera coordinate systems, T
is the translation vector between the cameras coordinate systems, E
is the essential matrix and F
is the fundamental matrix.
Rectification
After the calibration we can use the stereoRectify()
OpenCV function. Where R1
is the 3x3 Rectification Transformation (rotation matrix) for the first Camera and R2
for the second camera. p1
is Projection matrix 3x4 in the new and rectified coordinate system of the first camera and p2
for the second camera. Q
is the disparity matrix by depth 4x4. The initUndistortRectifyMap()
is called for images from both camera, so two times, here are computed the undistortion and the rectification transforms. Next we can use the remap()
function to do the geometrical transformation in the images.
Disparity Map
What is disparity? If we take a point in the real world, we’ll have a pixel in each image (from each camera capture) representing this point, so the disparity is the difference that this pixel have in the coordinate position between these two images. As our images are rectified, so just the difference in the x-axis. 😎
Here we need to take care to include the post-filtering available in the contrib modules, /ximgproc/disparity_filter
. Also the remaining here is based as the example from the post-filter.
Let’s declare some variables and prepare some stuffs that we will use later. Take care when using VideoCapture, because if you already have an internal camera on your machine and are trying to use two external cameras, you need to set as VideoCapture(1)
and VideoCapture(2)
. The option VideoCapture(0)
is for the internal camera.
Next we define some values for our stereo correspondence.
Then we continue to follow the excellent material from the OpenCV disparity filter, which says:
The filter instance is created by providing the instance of the StereoMatcher. Another instance is returned by createRightMatcher. These two instances are used to calculate the Disparity Maps for the right and left images, this is necessary for filtering afterwards.
Here we’ve used the SGBM algorithm.
Disparity Maps calculated by the respective match instances, just as the left image is passed to the filter. Note that we are using the original image to guide the filtering process.
As in the LIDAR result, here we show the light gray values for objects closer to the cameras and dark gray values for distant objects.
Results
In the next image, both results are shown, disparity map (camera) in column B and point cloud (LIDAR) in column C. Each line (from 1 to 5) with the person increasingly distant from the sensors.
As you can see, for a better differentiation, a new coloring was applied to the Disparity Map.
More about the color maps options in OpenCV documentation.
Discussion
Both sensors used here, would have problems in measuring depth in outdoor environments. The LIDAR, because it is really simple, can be convenient for small projects where it is necessary to point to just the same location or where there is not much movement, as it is quite dependent on the servo motors for movement.
These cameras will also suffer from the issue of movement. They can be very useful indoors and with little movement. But in an environment where there is a lot and fast movements (fast and furious motion… ha!), in addition to constant change of lighting, the sensors of the lower cost cameras will not be useful for Stereo Vision. Due to the Rolling Shutter process, which is how the camera lens works, more common in low-cost, webcams. The Rolling Shutter does not take a snapshot of the entire scene in a single instant, but in horizontal or vertical sequentially steps. This can make a gelatinous effect or other types of distortions. Cameras with the Global Shutter sensor are more suitable in this case, since a Global Shutter captures the entire scene at once, avoiding unwanted distortions, which can compromise the Disparity Map calculation.
Deeper Discussion
These experiments were carried out in order to evaluate the possibility of using low cost sensors in the application in intelligent vehicles. And maybe, try an application only with low cost cameras, avoiding LIDAR. One of our concerns was regarding the use of LIDAR, as there are materials that classify LIDAR as a Class 1 laser, being the safest for the human eye [2]. But it is also this material [3] that places lasers with a wavelength between 780nm and 1400nm, which is the range of the LIDAR, which can cause cataracts and burn the retina when in constant contact with the human eye. 😱
Obviously, today, with few vehicles using LIDAR on the streets, just in testing, this is not a concern. But in a future where autonomous vehicles are a reality, a large number of vehicles dipping lasers can be relevant. Imagine, for example, a traffic jam scenario on a large avenue. Nearby pedestrians can suffer from this. 🤔
It should also be taken into account that LIDAR can provide more reliable responses when detecting an obstacle and possibly avoiding an accident, where a camera may fail for some reason related to lighting, for example: a solar flare preventing the scene from being identified as a mostly, very light/white objects. 😯
Well, I hope you liked this post and see you next time. 🤘
Acknowledgements
This experiment is part of a project on visual perception for vehicular navigation from LAPiX (Image Processing and Computer Graphics Lab).
If you are going to talk about this approach, please cite as:
@article{chiarella:2020,
author = {Vito F. Chiarella, Thiago Rateke, Karla A. Justen, Antonio C Sobieranski, Sylvio L Mantelli, Eros Comunello, Aldo von Wangenheim},
title = {Comparison between low-cost passive and active vision for obstacle depth},
journal = {Revista de Ciência e Tecnologia (RCT)},
volume = {6},
year = {2020},
}
References
[1] Chiarella, V. F.; Rateke, T.; Justen, K. A.; Sobieranski, A. C.; Mantelli, S. L.; Comunello, E.; von Wangenheim, A. Comparison between low-cost passive and active vision for obstacle depth, (2020), Revista de Ciência e Tecnologia (RCT).
[2] American National Standard for Safe use of Lasers Outdoors, (2005).
[3] International Electrotechnical Commission. SAFETY OF LASER PRODUCTS. Part 1: Equipment classification, requirements and user’s guide, (2001), Group Safety Publication. International Standard.