Skip to main content
placeholder image

Transformer guided geometry model for flow-based unsupervised visual odometry

Journal Article


Abstract


  • Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Publication Date


  • 2021

Citation


  • Li, X., Hou, Y., Wang, P., Gao, Z., Xu, M., & Li, W. (2021). Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Computing and Applications, 33(13), 8031-8042. doi:10.1007/s00521-020-05545-8

Scopus Eid


  • 2-s2.0-85098705201

Web Of Science Accession Number


Start Page


  • 8031

End Page


  • 8042

Volume


  • 33

Issue


  • 13

Abstract


  • Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.

Publication Date


  • 2021

Citation


  • Li, X., Hou, Y., Wang, P., Gao, Z., Xu, M., & Li, W. (2021). Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Computing and Applications, 33(13), 8031-8042. doi:10.1007/s00521-020-05545-8

Scopus Eid


  • 2-s2.0-85098705201

Web Of Science Accession Number


Start Page


  • 8031

End Page


  • 8042

Volume


  • 33

Issue


  • 13