Advanced Vision Algorithm Helps Robots Learn to See in 3D
Researchers from Brown University and Duke University introduced a new object representation that allows the robots to identify three-dimensional objects with partially obscured geometry. They claimed that this method, called the Bayesian Eigenobjects (BEOs), was the first technique which could perform joint classification, pose estimation, and 3D geometric completion. In addition, BEOs outperformed extremely challenging tasks including joint classification, completion, and pose estimation on a large scale dataset of household objects in both accuracy and query time.
The author stated that robots generally become useless once they were positioned in a messy and unfamiliar territory. However, researchers from Duke University and Brown University have developed a new computer vision algorithm to solve this issue. In their paper, they stated that their new algorithm, BEOs, is able to recognize 3D objects and intuitively figure out objects which are partially obscured or tipped over. Before being tested, the robots were trained with 4000 three-dimensional objects consisting of bathtubs, beds, chairs, desks, dressers, monitors, night stands, sofas, tables, and toilets. Unlike other pattern recognition algorithms, BEOs does not need multiple angles to identify the type of the object. In the experiments, the robots were asked to identify 908 items from a single vantage point. They were able to correctly guess about 75% of the objects, which is a significant improvement compared to previous similar algorithms with an accuracy of 50%. Ben Burchfiel and George Konidaris, the authors of the paper, said that BEOs only searched for the similarities and differences between objects to recognize them, unlike conventional algorithms which needed to see the entire entity.
When BEOs discovers a consistency within classes, it will choose to ignore the consistency and shrink down the computational process, so the size of the problem can be reduced to a more manageable size while putting more effort on the parts which were different from each other. This way, BEOs simplifies the problem and releases more resources to where they need it the most: computationally expensive sections such as which side should be the upward side, or what the 3D geometry looks like if part of the geometry is hidden. This method is extremely applicable because objects are always overlapped and blocking each other in a real-world setting.
Tracy also pointed out that one of BEOs’ composition is the deep-learning algorithm which allows robots to quickly analyze input data. However, Burchfiel thought this approach is not good at the inverse task, especially when the output is more complex than the input data.
During the testing process, the trained robots obtained about 75% correct answers from a single point of view. BEOs is more accurate and faster than previous methods. This method did not need a complete scene to identify the objects and also did not require multiple views to acquire more information to recognize the objects. In addition, BEOs is similar to what human beings process image recognition, which generalizes what humans see and interpret into sensible objects rather than merely seeing the objects without correlating them to previous knowledge.
In the end, Burchfiel strongly stated that they expected to build a more robust system used for a baseline behind a general robot perception scheme.
This blog only summarized the main results from the paper. It is more worthy to take a look what the original paper presented.
In this paper, Burchfiel and Konidaris explained why it is important to introduce the new algorithm, BEOs, due to the inevitability of unseen objects in the real world. One of the key features is to perform partial-object completion in 3D. Robots do not have to learn a complete instance but parts of an instance. In addition, this algorithm focuses on the joint classification, pose estimation, and geometric completion.
Another feature of BEOs is to apply Variational Bayesian Principal Component Analysis (VBPCA) for a multi-class object representation to learn compact bases. Basically, VBPCA is an extension of probabilistic PCA (PPCA). For example, each data point can be represented as:
- x is the datapoint
- X is a matrix containing all data points
- W is a basis matrix
- c is the projection of each data point onto basis matrix
- mu is the mean of all data points
- epsilon is the zero-mean Gaussian noise
The summarized workflow is presented in Figure 1 and it is clear to see how this partial object completion works in BEOs. To see the performance of BEOs on classification tasks, authors also compare its result with other popular computer vision algorithms. The comparison scores are listed in Table 1 and the completion errors and query times are presented in Figure 2.
Hence, the BEOs algorithm outperforms all other previous algorithms by a significant factor on all kinds of performance tests.
Figure 3 shows a sampling of object completions and explains the main difference of 3DShapeNets and BEOs. 3DShapeNets classifies the object and complete its 3D shape at the same time, but BEOs take the object into parts first and then perform the classification process.
The next test is to compare BEOs and baseline algorithms in the pose estimation performance. the experiment is set up in both 1 degree of freedom (1DOF) with 1 degree of precision and 3 degrees of freedom (3DOF) with 20 degrees of precision. Figure 4 shows that the BEOs algorithm obtains lower rotation error in the pose estimation test.
To summarize the top-notch performance of BEOs, Figure 5 presents the test on joint pose, class, and 3D geometry estimation. It is easy to see how complete BEOs restores the 3D shape of the object compared to the ground truth.
This algorithm is a new way for robots to identify the objects in computer vision field and extremely useful in the real-world settings.
Benjamin Burchfiel† and George Konidaris, Bayesian Eigenobjects: A Unified Framework for 3D Robot Perception, https://storage.googleapis.com/rss2017-papers/10.pdf
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
M. Attene. A lightweight approach to repairing digitized polygon meshes. The Visual Computer, 26(11):1393–1406, 2010.
B. Drost, M. Ulrich, N. Navab, and S. Ilic. Model globally, match locally: Efficient and robust 3D object recognition. In Computer Vision and Pattern Recognition, pages 998–1005, 2010.
S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. Jan Latecki. Gift: A real-time and scalable 3D shape search engine. In Computer Vision and Pattern Recognition, June 2016.
C. M. Bishop. Variational principal components. In International Conference on Artificial Neural Networks, pages 509– 514, 1999.
C. M. Bishop. Bayesian PCA. In Advances in Neural Information Processing Systems, pages 382–388, 1999.
C. R. Maurer, Jr., R. Qi, and V. Raghavan. A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. Pattern Analysis and Machine Intelligence, 25:265–270, 2003.
D. Huber, A. Kapuria, R. Donamukkala, and M. Hebert. Partsbased 3D object classification. In Computer Vision and Pattern Recognition, volume 2, pages 82–89, 2004.
M. Daniels and R. Kass. Shrinkage estimators for covariance matrices. Biometrics, pages 1173–1184, 2001.
E. G. Learned-Miller. Data driven image models through continuous joint alignment. Pattern Analysis and Machine Intelligence, 28(2):236–250, 2006.
M. Elhoseiny, T. El-Gaaly, A. Bakry, and A. Elgammal. Convolutional models for joint object categorization and pose estimation. arXiv:1511.05175, 2015.
V. Hegde and R. Zadeh. Fusionnet: 3D object classification using multiple data representations. rXiv:1607.05695, 2016.
J. Glover, R. Rusu, and G. Bradski. Monte Carlo pose estimation with quaternion kernels and the Bingham distribution. In Robotics: Science and Systems, 2011.
A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. Pattern Analysis and Machine Intelligence, 21(5):433–449, 1999.
O. Ledoit and M. Wolf. Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. Journal of Multivariate Analysis, 139:360–384, 2015.
Y. Li, A. Dai, L. Guibas, and M. Nießner. Database-assisted object retrieval for real-time 3D reconstruction. In Computer Graphics Forum, volume 34, pages 435–446, 2015.
M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 61:611–622, 1999.
S. Marini, S. Biasotti, and B. Falcidieno. Partial matching by structural descriptors. In Content-Based Retrieval, 2006.
N. Payet and S. Todorovic. From contours to 3D object detection and pose estimation. In International Conference on Computer Vision, pages 983–990, 2011.
Venkatraman Narayanan and Maxim Likhachev. Perch: Perception via search for multi-object recognition and localization. In International Conference on Robotics and Automation, 2016.
P. J. Besl and N. D. McKay. Method for registration of 3-D shapes. Pattern Analysis and Machine Intelligence, 14:239–256, 1992.
C. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. Guibas. Volumetric and multi-view cnns for object classification on 3D data. In Computer Vision and Pattern Recognition, 2016.
Z. Ren and E. B. Sudderth. Three-dimensional object detection and layout prediction using clouds of oriented gradients. In Computer Vision and Pattern Recognition, 2016.
J. Rock, T. Gupta, J. Thorsen, J. Gwak, D. Shin, and D. Hoiem. Completing 3D object shape from one depth image. In Computer Vision and Pattern Recognition, pages 2484–2493, 2015.
S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In 3-D Digital Imaging and Modeling, pages 145– 152, 2001.
J. Schafer and K. Strimmer. A shrinkage approach to large-scale ¨ covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4(1):32, 2005.
B. Shi, S. Bai, Z. Zhou, and X. Bai. Deeppano: Deep panoramic representation for 3-D shape recognition. Signal Processing Letters, 22(12):2339–2343, 2015.
S. Song and J. Xiao. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Computer Vision and Pattern Recognition, 2016.
H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multiview convolutional neural networks for 3D shape recognition. In International Conference on Computer Vision, pages 945–953, 2015.
R. Tibshirani. Regression shrinkage and selection via the lasso. The Royal Statistical Society, pages 267–288, 1996.
S. Tulsiani and J. Malik. Viewpoints and keypoints. In 2015 Computer Vision and Pattern Recognition, pages 1510–1519, 2015.
S. Tulsiani, A. Kar, Q. Huang, J. Carreira, and J. Malik. Shape and symmetry induction for 3D objects. CoRR, abs/1511.07845, 2015.
V. Nair and G. E. Hinton. 3D object recognition with deep belief nets. In Advances in Neural Information Processing Systems, pages 1339–1347, 2009.
J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In Advances in Neural Information Processing Systems, pages 82–90, 2016.
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3D shapenets: A deep representation for volumetric shapes. In Computer Vision and Pattern Recognition, pages 1912–1920, 2015.
Y. Kim, N. J. Mitra, D. M. Yan, and L. Guibas. Acquiring 3D indoor environments with variability and repetition. ACM Transactions on Graphics, 31:138:1–138:11, 2012.
Y. Kim, N. J. Mitra, Q. Huang, and L. Guibas. Guided realtime scanning of indoor objects. In Computer Graphics Forum, volume 32, pages 177–186, 2013.
Blog Author: Tracy Staedter
Technical Author: Bin