Dense tracking, mapping and scene labeling using a depth camera

Andrés Alejandro Díaz-Toro; Lina María Paz-Pérez; Pedro Antonio Piniés-Rodríguez; Eduardo Francisco Caicedo-Bravo

doi:10.17533/udea.redin.n86a07

Autores/as

Andrés Alejandro Díaz-Toro Universidad del Valle
Lina María Paz-Pérez Corporación Intel
Pedro Antonio Piniés-Rodríguez Corporación Intel
Eduardo Francisco Caicedo-Bravo Universidad del Valle https://orcid.org/0000-0003-0727-2917

DOI:

https://doi.org/10.17533/udea.redin.n86a07

Palabras clave:

reconstrucción densa, localización de la cámara, sensor de profundidad, representación volumétrica, detección de objetos, etiquetamiento de múltiples instancias

Resumen

Presentamos un sistema de localización con información densa, reconstrucción 3D, y detección de objetos en ambientes tipo escritorio, usando una cámara de profundidad; el sensor Kinect. La cámara se mueve manualmente mientras se estima su posición, y se construye un modelo denso con información de color de la escena que se actualiza permanentemente. El usuario puede, alternativamente, acoplar el módulo de detección de objetos (YOLO: you only look once [1]) para detectar y propagar al modelo información de categorías de objetos comúnmente encontrados sobre escritorios, como monitores, teclados, libros, vasos y laptops, obteniendo un modelo con color asociado a la categoría del objeto. La posición de la cámara es estimada usando una técnica modelo-frame con el algoritmo iterativo de punto más cercano (ICP, iterative closest point) con resolución en niveles, logrando una trayectoria libre de deriva, robustez a movimientos rápidos de la cámara y a condiciones variables de luz. Simultáneamente, los mapas de profundidad son fusionados en una estructura volumétrica desde las posiciones estimadas de la cámara. Para visualizar una representación explícita de la escena se emplea el algoritmo marching cubes. Los algoritmos de localización, fusión, marching cubes y detección de objetos fueron implementados usando hardware para procesamiento gráfico con el fin de mejorar el desempeño del sistema. Se lograron resultados sobresalientes en la posición de la cámara, alta calidad en la geometría y color del modelo, estabilidad del color usando el módulo de detección de objetos (robustez a detecciones erróneas) y manejo exitoso de múltiples instancias de la misma categoría.

|Resumen

= 421 veces | PDF (ENGLISH)

= 242 veces|

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Andrés Alejandro Díaz-Toro, Universidad del Valle

Grupo de Investigación Percepción y Sistemas Inteligentes (PSI), Escuela de Ingeniería Eléctrica y Electrónica.

Lina María Paz-Pérez, Corporación Intel

Investigadora y Desarrolladora de Software.

Pedro Antonio Piniés-Rodríguez, Corporación Intel

Investigador y Desarrollador de Software.

Eduardo Francisco Caicedo-Bravo, Universidad del Valle

Grupo de Percepción y Sistemas Inteligentes (PSI), Escuela de Ingeniería Eléctrica y Electrónica.

Citas

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ”You Only Look Once: Unified, Real-Time Object Detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp. 779-788.

A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, ”Monoslam: Real-time single camera slam,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052-1067, 2007.

G. Klein and D. Murray, ”Improving the agility of keyframe-based SLAM,” in 10th European Conference on Computer Vision, Marseille, France, 2008, pp. 802-815.

G. Silveira, E. Malis, and P. Rives, ”An Efficient Direct Method for Improving visual SLAM,” in IEEE International Conference on Robotics and Automation, Rome, Italy, 2007, pp. 10-14.

R. A. Newcombe et al., ”KinectFusion: Real-time dense surface mapping and tracking,” in 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 2011, pp. 127-136.

T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. Mcdonald, ”Robust real-time visual odometry for dense RGB-D mapping,” in IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 5724-5731.

R. Rusu and S. Cousins, ”3D is here: Point cloud library (PCL),” in IEEE International Conference on Robotics and Automation, Shanghai, China, 2011, pp. 1-4.

B. Curless and M. Levoy, ”A volumetric method for building complex models from range images,” in 23rd Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1996, pp. 303-312.

T. Whelan et al., ”Kintinuous: Spatially extended kinectfusion,” in RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, Sydney, Australia, 2012, pp. 1-8.

F. Steinbruker, J. Sturm, and D. Cremers, ”Real-Time Visual Odometry from Dense RGB-D Images,” in IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 719-722.

A. S. Huang et al., ”Visual odometry and mapping for autonomous flight using an RGB-D camera,” in 15th International Symposium of Robotics Research, Flagstaff, USA, 2011, pp. 235-252.

T. Whelan et al., ”Real-time Large-scale Dense RGB-D SLAM with Volumetric Fusion”, International Journal of Robotics Research, vol. 34, no. 4, pp. 598-626, 2015.

T. Whelan, S. Leutenegger, R. Salas-Moreno, B. Glocker and A. Davison, ”ElasticFusion: Dense SLAM without a Pose Graph,” in Robotics: Science and Systems Conference, Rome, Italy, 2015, pp. 1-9.

R. Mur and J. Tardós, ”ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017.

A. Concha and J. Civera, RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System, 2017. [online] Available: https://www.researchgate.net/publication/314182379_RGBDTAM_A_Cost-Effective_and_Accurate_RGB-D_Tracking_and_Mapping_System.

K. Lai, L. Bo, X. Ren, and D. Fox, ”Detection-based object labeling in 3d scenes,” in IEEE International Conference on Robotics and Automation, St. Paul, USA, 2012, pp. 1330-1337.

K. Lai, L. Bo, and D. Fox, ”Unsupervised feature learning for 3d scene labeling,” in IEEE International Conference on Robotics and Automation, Hong Kong, China, 2014, pp. 3050-3057.

J. Bao, Y. Jia, Y. Cheng, and N. Xi, ”Saliency-guided detection of unknown objects in RGB-D indoor scenes,” Sensors, vol. 15, no. 9, pp. 21054–21074, 2015.

C. Ren, V. Prisacariu, D. Murray, and I. Reid, ”Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data,” in International Conference on Computer Vision, Sydney, Australia, 2013, pp.1561-1568.

L. Ma and G. Sibley, ”Unsupervised dense object discovery, detection, tracking and reconstruction,” in European Conference on Computer Vision, Zurich, Switzerland, 2014, pp. 80-95.

W. Lorensen and H. Cline, ”Marching cubes: A high resolution 3D surface construction algorithm,” in 14th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1987, pp. 163-169.

S. Parker, P. Shirley, Y. Livnat, C. Hansen, and P. Sloan, ”Interactive ray tracing for isosurface rendering,” in Conference on Visualization, Los Alamitos, USA, 1998, pp. 233-238.

J. Pineda, ”A Parallel Algorithm for Polygon Rasterization,” in 15th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1988, pp. 17-20.

C. Kerl, J. Sturm, and D. Cremers, ”Robust odometry estimation for RGB-D cameras,” in International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 3748–3754.

E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers, ”Real-time camera tracking and 3d reconstruction using signed distance functions,” in Robotics: Science and Systems Conference, Berlin, Germany, 2013, pp. 8-16.

A. Díaz, L. Paz, E. Caicedo, and P. Piniés, ”Dense Tracking with Range Cameras Using Key Frames,” in Latin American Robotics Symposium and Brazilian Conference on Robotics, Uberlandia, Brasil, 2016, pp. 20-38.

J. Redmon, Darknet: Open source neural networks in c, 2013. [Online]. Available: http://pjreddie.com/darknet/, Accessed on: February 26, 2018.

O. Russakovsky et al., ”ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, ”A Benchmark for the Evaluation of RGB-D SLAM Systems,” in International Conference on Intelligent Robot Systems (IROS), Vilamoura, Portugal, 2012, pp. 573-580.

A. Handa, R. A. Newcombe, A. Angeli, and A. J. Davison, ”Real-time camera tracking: When is high frame-rate best?” in 12th of the European Conference on Computer Vision, Florence, Italy, 2012, pp. 222-235.