Dense tracking, mapping and scene labeling using a depth camera

Andrés Alejandro Díaz-Toro; Lina María Paz-Pérez; Pedro Antonio Piniés-Rodríguez; Eduardo Francisco Caicedo-Bravo

doi:10.17533/udea.redin.n86a07

Authors

Andrés Alejandro Díaz-Toro University of Valle
Lina María Paz-Pérez Intel Corporation
Pedro Antonio Piniés-Rodríguez Intel Corporation
Eduardo Francisco Caicedo-Bravo University of Valle https://orcid.org/0000-0003-0727-2917

DOI:

https://doi.org/10.17533/udea.redin.n86a07

Keywords:

dense reconstruction, camera tracking, depth sensor, volumetric representation, object detection, multiple instance labeling

Abstract

We present a system for dense tracking, 3D reconstruction, and object detection of desktop-like environments, using a depth camera; the Kinect sensor. The camera is moved by hand meanwhile its pose is estimated, and a dense model, with evolving color information of the scene, is constructed. Alternatively, the user can couple the object detection module (YOLO: you only look once [1]) for detecting and propagating to the model information of categories of objects commonly found over desktops, like monitors, keyboards, books, cups, and laptops, getting a model with color associated to object categories. The camera pose is estimated using a model-to-frame technique with a coarse-to-fine iterative closest point algorithm (ICP), achieving a drift-free trajectory, robustness to fast camera motion and to variable lighting conditions. Simultaneously, the depth maps are fused into the volumetric structure from the estimated camera poses. For visualizing an explicit representation of the scene, the marching cubes algorithm is employed. The tracking, fusion, marching cubes, and object detection processes were implemented using commodity graphics hardware for improving the performance of the system. We achieve outstanding results in camera pose, high quality of the model’s color and geometry, and stability in color from the detection module (robustness to wrong detections) and successful management of multiple instances of the same category.

|Abstract

= 421 veces | PDF

= 242 veces|

Downloads

Download data is not yet available.

Author Biographies

Andrés Alejandro Díaz-Toro, University of Valle

Perception and Intelligent Systems (PSI) Research Group, School of Electrical and Electronic Engineering.

Lina María Paz-Pérez, Intel Corporation

Researcher and Software Developer.

Pedro Antonio Piniés-Rodríguez, Intel Corporation

Researcher and Software Developer.

Eduardo Francisco Caicedo-Bravo, University of Valle

Perception and Intelligent Systems (PSI) Research Group, School of Electrical and Electronic Engineering.

References

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ”You Only Look Once: Unified, Real-Time Object Detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp. 779-788.

A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, ”Monoslam: Real-time single camera slam,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052-1067, 2007.

G. Klein and D. Murray, ”Improving the agility of keyframe-based SLAM,” in 10th European Conference on Computer Vision, Marseille, France, 2008, pp. 802-815.

G. Silveira, E. Malis, and P. Rives, ”An Efficient Direct Method for Improving visual SLAM,” in IEEE International Conference on Robotics and Automation, Rome, Italy, 2007, pp. 10-14.

R. A. Newcombe et al., ”KinectFusion: Real-time dense surface mapping and tracking,” in 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 2011, pp. 127-136.

T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. Mcdonald, ”Robust real-time visual odometry for dense RGB-D mapping,” in IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 5724-5731.

R. Rusu and S. Cousins, ”3D is here: Point cloud library (PCL),” in IEEE International Conference on Robotics and Automation, Shanghai, China, 2011, pp. 1-4.

B. Curless and M. Levoy, ”A volumetric method for building complex models from range images,” in 23rd Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1996, pp. 303-312.

T. Whelan et al., ”Kintinuous: Spatially extended kinectfusion,” in RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, Sydney, Australia, 2012, pp. 1-8.

F. Steinbruker, J. Sturm, and D. Cremers, ”Real-Time Visual Odometry from Dense RGB-D Images,” in IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 719-722.

A. S. Huang et al., ”Visual odometry and mapping for autonomous flight using an RGB-D camera,” in 15th International Symposium of Robotics Research, Flagstaff, USA, 2011, pp. 235-252.

T. Whelan et al., ”Real-time Large-scale Dense RGB-D SLAM with Volumetric Fusion”, International Journal of Robotics Research, vol. 34, no. 4, pp. 598-626, 2015.

T. Whelan, S. Leutenegger, R. Salas-Moreno, B. Glocker and A. Davison, ”ElasticFusion: Dense SLAM without a Pose Graph,” in Robotics: Science and Systems Conference, Rome, Italy, 2015, pp. 1-9.

R. Mur and J. Tardós, ”ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017.

A. Concha and J. Civera, RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System, 2017. [online] Available: https://www.researchgate.net/publication/314182379_RGBDTAM_A_Cost-Effective_and_Accurate_RGB-D_Tracking_and_Mapping_System.

K. Lai, L. Bo, X. Ren, and D. Fox, ”Detection-based object labeling in 3d scenes,” in IEEE International Conference on Robotics and Automation, St. Paul, USA, 2012, pp. 1330-1337.

K. Lai, L. Bo, and D. Fox, ”Unsupervised feature learning for 3d scene labeling,” in IEEE International Conference on Robotics and Automation, Hong Kong, China, 2014, pp. 3050-3057.

J. Bao, Y. Jia, Y. Cheng, and N. Xi, ”Saliency-guided detection of unknown objects in RGB-D indoor scenes,” Sensors, vol. 15, no. 9, pp. 21054–21074, 2015.

C. Ren, V. Prisacariu, D. Murray, and I. Reid, ”Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data,” in International Conference on Computer Vision, Sydney, Australia, 2013, pp.1561-1568.

L. Ma and G. Sibley, ”Unsupervised dense object discovery, detection, tracking and reconstruction,” in European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 80-95.

W. Lorensen and H. Cline, ”Marching cubes: A high resolution 3D surface construction algorithm,” in 14th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1987, pp. 163-169.

S. Parker, P. Shirley, Y. Livnat, C. Hansen, and P. Sloan, ”Interactive ray tracing for isosurface rendering,” in Conference on Visualization, Los Alamitos, USA, 1998, pp. 233-238.

J. Pineda, ”A Parallel Algorithm for Polygon Rasterization,” in 15th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1988, pp. 17-20.

C. Kerl, J. Sturm, and D. Cremers, ”Robust odometry estimation for RGB-D cameras,” in International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 3748–3754.

E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers, ”Real-time camera tracking and 3d reconstruction using signed distance functions,” in Robotics: Science and Systems Conference, Berlin, Germany, 2013, pp. 8-16.

A. Díaz, L. Paz, E. Caicedo, and P. Piniés, ”Dense Tracking with Range Cameras Using Key Frames,” in Latin American Robotics Symposium and Brazilian Conference on Robotics, Uberlandia, Brasil, 2016, pp. 20-38.

J. Redmon, Darknet: Open source neural networks in c, 2013. [Online]. Available: http://pjreddie.com/darknet/, Accessed on: February 26, 2018.

O. Russakovsky et al., ”ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, ”A Benchmark for the Evaluation of RGB-D SLAM Systems,” in International Conference on Intelligent Robot Systems (IROS), Vilamoura, Portugal, 2012, pp. 573-580.

A. Handa, R. A. Newcombe, A. Angeli, and A. J. Davison, ”Real-time camera tracking: When is high frame-rate best?” in 12th of the European Conference on Computer Vision, Florence, Italy, 2012, pp. 222-235.