• Mohamed Ali SEDRINE Tunisia Polytechnic School
  • Wided SOUIDENE MSEDDI SERCOM Lab, Tunisia Polytechnic School, Carthage University, La Marsa, Tunisia & L2TI, Paris 13 University, Paris, France
  • Rabah ATTIA
Keywords: visual odometry; localization; loop-closure; deep learning; convolutional neural network


This paper presents a vision-based localization framework based on visual odometry.Visual odometry is a classic approach to incrementally estimate robot motion even in GPS denied environment, by tracking features in successive images. As it is subject to drift, this paper proposes to call  a convolutional neural netwok and visual memory to improve process accuracy.

In fact, our framework is made of two main steps. First, the robot builds its visual memory by annotating places with their ground truth positions. Dedicated data structures are made to store referenced images and their positions. Then, during navigation step, we use loop closure corrected visual odometry. A siamese convolutional neural network allows us to detect already visited positions. It takes as input current image and an already stored one. If the place is recognized, the drift is then quantified using the stored position. Drift correction is conducted by an original two levels correction process. The first level is directly applied to the  estimation by substracting the error. The second level is applied to the graph itself using iterative closest point method, to match the estimated trajectory graph  to the ground truth one.

Experiments showed that the proposed localization method has a centimetric accuracy.


Download data is not yet available.


Scaramuzza, D. and Fraundorfer, F. 2011. ”Visual Odometry [Tutorial],” in IEEE Robotics & Automation Magazine, vol. 18, no. 4, pp. 80-92.

Fraundorfer, F. and Scaramuzza, D. 2012.“Visual odometry: Part ii - matching, robustness, and applications,” in IEEE Robotics & Automation Magazine - IEEE Robotics & Automation Magazine, vol. 19, pp. 78–90.

Poddar, Shashi et al. (2018). “Evolution of Visual Odometry Techniques.” [Online]. Available: [Accessed April 2020]

Jiang, Y.; Xiong, G.; Chen, H. and Lee, D.-J.2014.“Incorporating a Wheeled Vehicle Model in a New Monocular Visual Odometry Algorithm for Dynamic Outdoor Environments.”, in Sensors, 14, pp. 16159-16180.

Siddiqui,J. R. and Khatibi, S. 2014.“Robust visual odometry estimation of road vehicle from dominant surfaces for large-scale mapping”, in IET Intelligent Transport Systems 9.

Scaramuzza, D. and Siegwart, R.2008.“Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles.”,in: Gasteratos A., Vincze M., Tsotsos J.K. (eds) Computer Vision Systems. ICVS. Lecture Notes in Computer Science, vol 5008. Springer, Berlin, Heidelberg.

Houssem Eddine Benseddik, Oualid Djekoune, and Mahmoud Belhocine. 2014. “SIFT and SURF Performance Evaluation for Mobile Robot-Monocular Visual Odometry,” in Journal of Image and Graphics, Vol. 2, No. 1, pp. 70-76.

Yu, Y., Pradalier, C., and Zong, G. 2011. “Appearance-based monocular visual odometry for ground vehicles”, in IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 862-867.

Zhou, D., Dai, Y., and Li, H. 2016. “Reliable scale estimation and correction for monocular Visual Odometry.”, in IEEE Intelligent Vehicles Symposium (IV), pp. 490-495.

Caramazana, L.; Arroyo ,R. and Bergasa, L. M. 2016.“Visual odometry correction based on loop closure detection”, inOpen Conference on Future Trends in Robotics, pp. 97-104.

Daneshmand, M.; Avots, E. and Anbarjafari, G. 2018.“Proportional error back-propagation (peb): Real-time automatic loop closure correction for maintaining global consistency in 3d reconstruction with minimal computational cost,”in Measurement Science Review 18, pp. 86–93.

Sirtkaya, S.; Seymen, B. and Alatan, A. A. 2013.“Loosely coupled Kalman filtering for fusion of Visual Odometry and inertial navigation”, in Proceedings of the 16th International Conference on Information Fusion, pp. 219-226.

Li, M. and Mourikis, A. I. 2012.“Vision-aided inertial navigation for resource-constrained systems”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1057-1063.

Filliat, D. 2007.“A visual bag of words method for interactive qualitative localization and mapping,” in Proceedings ofIEEE International Conference on Robotics and Automation, Roma, pp. 3921-3926.

Garcia-Fidalgo, E. and Ortiz, A. 2018.“iBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of Binary Words,” in IEEE Robotics and Automation Letters, vol.3, n 4, pp. 3051-3057.

Ma, J.; Qian, K.; Ma, X. and Zhao, W. 2018. “Reliable Loop Closure Detection Using 2-channel Convolutional Neural Networks for Visual SLAM,” in37th Chinese Control Conference (CCC), pp. 5347-5352.

An, S.; Che, G.; Zhou, F.; Liu, X.; Ma, X. andChen,Y. 2019. “Fast and incremental loop closure detection using proximity graphs”.[Online]. Available: [Accessed April 2020]

Bay, H.; Ess, A.; Tuytelaars, T. and Van Gool, L. 2008. “Speeded-Up Robust Features (SURF),” inComputer Vision and Image Understing, pp. 346–359.

Sedrine, M. A.; Souidène Mseddi, W.; Abdellatif, T. and Attia, R. 2019. “Loop Closure Detection for Monocular Visual Odometry: Deep-Learning Approaches Comparison.” in 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (2019): 483-490.

Sabatini, R.; Bartel, C.; Kaharkar,A. and Shaid, T. 2012. “Design and integration of vision based sensors for unmanned aerial vehicles navigation and guidance,” in Optical Sensing and Detection II

Courbon, J.; Mezouar, Y.; Guenard, N. ; and Martinet, P. 2010. “Vision‐based navigation of unmanned aerial vehicles,” in Control Engineering Practice, pp. 789–799.

Harris, C. and Stephens, M. 1988.“A combined corner and edge detector,” in Proceedings 4th Alvey Vision Conference, pp. 147–151.

Forster,C.; Pizzoli,M.; and Scaramuzza,D. 2014. “SVO: Fast semi-direct monocular visual odometry,” in IEEE International Conference on Robotics and Automation, pp. 15-22.

Klein, G. and Murray, D. 2007.“Parallel Tracking and Mapping for Small AR Workspaces,” in6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225-234.

Krizhevsky, A.; Sutskever, I. and Hinton, G. E. 2012. “ImageNet classification with deep convolutional neural networks”, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, pp. 1097–1105.

Szegedy, C.; Liu,W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V. and Rabinovich, A. 2015. “Going deeper with convolutions”, in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9.

Szegedy, C.; Ioffe, S.; Vanhoucke, V. and Alemi, A. A. 2017. “Inception-v4, inception-ResNet and the impact of residual connections on learning.”,in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284.

Bromley, J.; Bentz, J. W.; Bottou, L.; Guyon, I.; LeCun, Y.; Moore,Y. et al. 1993. “Signature verification using a siamese time delay neural network”, in International Journal for Pattern Recognition and Artificial Inteligence., vol. 7, no. 4, pp. 669-687.

Zagoruyko, S. and Komodakis, N. 2015.“Learning to compare image patches via convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353-4361.

Lowe, D.2004. “Distinctive image features from scale-invariant keypoints”, in International Journal of Computer Vision 60, pp. 91–110.

How to Cite