Computer Vision: Stereo Vision and Volume Measurement

Max Dimanov
4 min readJan 30, 2019

--

Stereo Vision

Stereo Vision implies an idea for AI to perceive the depth of an image and a distance to objects using a pair of cameras. Most 3D camera models are based on stereo vision theory and technology. Two cameras are set at some distance from each other, so they “see” objects from different angles. Assessing the correspondence between the two images, AI determines the distance to objects, analyzes, and builds the 3D structure of the object.

The “eyes” of AI

Using stereo vision, there is no need in such distance measurement sensors as infrared sensors, sound locators or laser radars, which significantly reduces the cost of technical solutions.

Source: Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library” by Adrian Kaehler and Gary Bradski, Published by O’Reilly Media, Inc. p. 706

Main fields of use:

1. Human’s postures and gestures recognition;

2. 3D models and 3D scenes creation;

3. Autonomous systems obstacles positioning and detection.

Each of the specified fields can be modified to solve a particular issue. So a 3D scene construction can be used for measuring the volume of an object or a product.

In practice, stereo vision is realized in four stages with the help of two cameras:

1. Distortion Correction — a mathematical removal of radial and tangential distortions of the lens on images to get an undistorted image.

2. Angles and distances adjustment between cameras in the rectification process. The result is line-aligned and refined images, which means two images of a plane are coplanar and their lines are aligned in the same direction and have the same y-coordinate.

Source: Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library” by Adrian Kaehler and Gary Bradski, Published by O’Reilly Media, Inc. p. 735

3. The point matching process — a search for correspondences between the points of the left and right cameras. After that, you’ll have a disparity map — where values correspond to differences in the x-coordinate of the image for the same point of the left and right cameras.

4. As a result, having a geometrical arrangement of cameras, we produce a disparity map triangulation. This is the stage of reprojection, which forms a depth map, that is, the desired 3D scene.

For the first two stages, you must first calculate the configuration parameters of a pair of cameras. This can be done automatically by various binary markers, such as ArUco or ChArUco. The main advantage of these markers is that even a single marker provides sufficient consistency to obtain a camera posture. In addition, internal binary codification makes them especially reliable, since it allows the use of error detection and correction methods. You can also use the markers to determine the geometry of the area under the camera.

Source: https://docs.opencv.org/3.4/d5/dae/tutorial_aruco_detection.html

Volume Measurement

To measure volume, you also need to add the following steps to the processing:

1. To accumulate a set of consecutive frames, which will increase the resilience of the result to errors recovery 3D-scene. The set is used for averaging or refining the 3D scene;

2. To select only points of the defined product in the scene. This is done by using color segmentation, template matching, or neural networks semantic segmentation. The fastest method is color segmentation, which is recommended when it comes to fast time solutions. The minus of the method is binding the settings to a specific product, which can give poor results if the background color and the object are not very clear;

If GPGPU optimization is possible, then there is high performance and segmentation accuracy using U-shaped convolutional neural networks such as U-net and Advanced U-Net, or fully convolutional neural networks;

3. To cluster a scene consisting of 3D points of the defined product. A cluster is a single object;

Source:https://www.semanticscholar.org/paper/Explaining-Point-Cloud-Segments-in-Terms-of-Object-Lang-Piater/7e3b859feae6b09a3da7201e74b553d79060885e

4. To form a convex polygon for each cluster, which allows eliminating edge defects in 3D-scene objects;

5. To restore missing 3D points using linear interpolation;

6. To calculate the volume of individual objects by integration over the cluster area based on the scene geometry;

source: Thomas’ calculus” by Wier, Hass & Giordano, 2008, Pearson Education Inc.

7. And finally, to calculate the total volume of all objects.

I’m always eager to share my best practices and wide open to learn something new, so if you have any questions or ideas — feel free to write to me or leave a comment in the comments section!

--

--