Dense tracking, mapping and scene labeling using a depth camera
DOI:
https://doi.org/10.17533/udea.redin.n86a07Keywords:
dense reconstruction, camera tracking, depth sensor, volumetric representation, object detection, multiple instance labelingAbstract
We present a system for dense tracking, 3D reconstruction, and object detection of desktop-like environments, using a depth camera; the Kinect sensor. The camera is moved by hand meanwhile its pose is estimated, and a dense model, with evolving color information of the scene, is constructed. Alternatively, the user can couple the object detection module (YOLO: you only look once [1]) for detecting and propagating to the model information of categories of objects commonly found over desktops, like monitors, keyboards, books, cups, and laptops, getting a model with color associated to object categories. The camera pose is estimated using a model-to-frame technique with a coarse-to-fine iterative closest point algorithm (ICP), achieving a drift-free trajectory, robustness to fast camera motion and to variable lighting conditions. Simultaneously, the depth maps are fused into the volumetric structure from the estimated camera poses. For visualizing an explicit representation of the scene, the marching cubes algorithm is employed. The tracking, fusion, marching cubes, and object detection processes were implemented using commodity graphics hardware for improving the performance of the system. We achieve outstanding results in camera pose, high quality of the model’s color and geometry, and stability in color from the detection module (robustness to wrong detections) and successful management of multiple instances of the same category.
Downloads
References
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ”You Only Look Once: Unified, Real-Time Object Detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp. 779-788.
A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, ”Monoslam: Real-time single camera slam,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052-1067, 2007.
G. Klein and D. Murray, ”Improving the agility of keyframe-based SLAM,” in 10th European Conference on Computer Vision, Marseille, France, 2008, pp. 802-815.
G. Silveira, E. Malis, and P. Rives, ”An Efficient Direct Method for Improving visual SLAM,” in IEEE International Conference on Robotics and Automation, Rome, Italy, 2007, pp. 10-14.
R. A. Newcombe et al., ”KinectFusion: Real-time dense surface mapping and tracking,” in 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 2011, pp. 127-136.
T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. Mcdonald, ”Robust real-time visual odometry for dense RGB-D mapping,” in IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 5724-5731.
R. Rusu and S. Cousins, ”3D is here: Point cloud library (PCL),” in IEEE International Conference on Robotics and Automation, Shanghai, China, 2011, pp. 1-4.
B. Curless and M. Levoy, ”A volumetric method for building complex models from range images,” in 23rd Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1996, pp. 303-312.
T. Whelan et al., ”Kintinuous: Spatially extended kinectfusion,” in RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, Sydney, Australia, 2012, pp. 1-8.
F. Steinbruker, J. Sturm, and D. Cremers, ”Real-Time Visual Odometry from Dense RGB-D Images,” in IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 719-722.
A. S. Huang et al., ”Visual odometry and mapping for autonomous flight using an RGB-D camera,” in 15th International Symposium of Robotics Research, Flagstaff, USA, 2011, pp. 235-252.
T. Whelan et al., ”Real-time Large-scale Dense RGB-D SLAM with Volumetric Fusion”, International Journal of Robotics Research, vol. 34, no. 4, pp. 598-626, 2015.
T. Whelan, S. Leutenegger, R. Salas-Moreno, B. Glocker and A. Davison, ”ElasticFusion: Dense SLAM without a Pose Graph,” in Robotics: Science and Systems Conference, Rome, Italy, 2015, pp. 1-9.
R. Mur and J. Tardós, ”ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017.
A. Concha and J. Civera, RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System, 2017. [online] Available: https://www.researchgate.net/publication/314182379_RGBDTAM_A_Cost-Effective_and_Accurate_RGB-D_Tracking_and_Mapping_System.
K. Lai, L. Bo, X. Ren, and D. Fox, ”Detection-based object labeling in 3d scenes,” in IEEE International Conference on Robotics and Automation, St. Paul, USA, 2012, pp. 1330-1337.
K. Lai, L. Bo, and D. Fox, ”Unsupervised feature learning for 3d scene labeling,” in IEEE International Conference on Robotics and Automation, Hong Kong, China, 2014, pp. 3050-3057.
J. Bao, Y. Jia, Y. Cheng, and N. Xi, ”Saliency-guided detection of unknown objects in RGB-D indoor scenes,” Sensors, vol. 15, no. 9, pp. 21054–21074, 2015.
C. Ren, V. Prisacariu, D. Murray, and I. Reid, ”Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data,” in International Conference on Computer Vision, Sydney, Australia, 2013, pp.1561-1568.
L. Ma and G. Sibley, ”Unsupervised dense object discovery, detection, tracking and reconstruction,” in European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 80-95.
W. Lorensen and H. Cline, ”Marching cubes: A high resolution 3D surface construction algorithm,” in 14th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1987, pp. 163-169.
S. Parker, P. Shirley, Y. Livnat, C. Hansen, and P. Sloan, ”Interactive ray tracing for isosurface rendering,” in Conference on Visualization, Los Alamitos, USA, 1998, pp. 233-238.
J. Pineda, ”A Parallel Algorithm for Polygon Rasterization,” in 15th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1988, pp. 17-20.
C. Kerl, J. Sturm, and D. Cremers, ”Robust odometry estimation for RGB-D cameras,” in International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 3748–3754.
E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers, ”Real-time camera tracking and 3d reconstruction using signed distance functions,” in Robotics: Science and Systems Conference, Berlin, Germany, 2013, pp. 8-16.
A. Díaz, L. Paz, E. Caicedo, and P. Piniés, ”Dense Tracking with Range Cameras Using Key Frames,” in Latin American Robotics Symposium and Brazilian Conference on Robotics, Uberlandia, Brasil, 2016, pp. 20-38.
J. Redmon, Darknet: Open source neural networks in c, 2013. [Online]. Available: http://pjreddie.com/darknet/, Accessed on: February 26, 2018.
O. Russakovsky et al., ”ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, ”A Benchmark for the Evaluation of RGB-D SLAM Systems,” in International Conference on Intelligent Robot Systems (IROS), Vilamoura, Portugal, 2012, pp. 573-580.
A. Handa, R. A. Newcombe, A. Angeli, and A. J. Davison, ”Real-time camera tracking: When is high frame-rate best?” in 12th of the European Conference on Computer Vision, Florence, Italy, 2012, pp. 222-235.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Revista Facultad de Ingeniería Universidad de Antioquia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Revista Facultad de Ingeniería, Universidad de Antioquia is licensed under the Creative Commons Attribution BY-NC-SA 4.0 license. https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
The material published in the journal can be distributed, copied and exhibited by third parties if the respective credits are given to the journal. No commercial benefit can be obtained and derivative works must be under the same license terms as the original work.