Light Mode

MAC-VO: Metrics-Aware Covariance for Learning-based Stereo Visual Odometry

Yuheng Qiu*, Yutian Chen*, Zihao Zhang, Wenshan Wang, Sebastian Scherer
Carnegie Mellon University

* Equal Contribution.

Abstract

We propose MAC-VO, a novel learning-based stereo VO that leverages the learned metrics-aware matching uncertainty for dual purposes: selecting keypoint and weighing the residual in pose graph optimization. Compared to traditional geometric methods prioritizing texture-affluent features like edges, our keypoint selector employs the learned uncertainty to filter out the low-quality features based on global inconsistency. In contrast to the learning-based algorithms that rely on the scale-agnostic weight matrix, we design a metrics-aware spatial covariance model to capture the spatial information during keypoint registration. Integrating this covariance model into pose graph optimization enhances the robustness and reliability of pose estimation, particularly in challenging environments with varying illumination, feature density, and motion patterns. On public benchmark datasets, MAC-VO outperforms existing VO algorithms, even some SLAM algorithms in challenging environments. The covariance-aware framework also provides valuable information about the reliability of the estimated poses, which can benefit decision-making for autonomous systems.

MAC-VO Dense Mapping Mode

MAC-VO supports the "dense mapping" mode. Leveraging the uncertainty prediction network, we can conveniently select reliable depth estimations for dense mapping without bundle adjustment / multi-frame optimization. The following video shows the dense mapping result on EuRoC, VBR, TartanAir, and TartanAir v2. No post-processing is applied.

Zed X Fire Academy 2

VBR Diag Train 0

VBR Spagna Test 0 (Dynamic Scene)

VBR Spagna Test 0 (2) (Dynamic Scene)

VBR Colosseo Train 0 (Extreme Exposure)

TartanAir v2 - Abandon School 1

TartanAir - Abandon Factory 0

EuRoC V102

TartanAir v2 Test (Easy Subset) 3

Zed X Fire Academy 1

Intro Video

Methods

System Pipeline

Figure 1. MAC-VO System pipeline. First, we use a shared matching network to estimate the depth, flow, and corresponding uncertainty. Secondly, we employ the learned uncertainty to filter out unreliable features. Lastly, we optimize the pose with the metrics-aware covariance model.

Metrics-Aware Spatial Covariance

Figure 2. a) Depth uncertainty estimated with the presence of matching uncertainty. b) Projecting depth and matching uncertainty on sensor plane to 3D space. c) Residual $\mathcal{L}_i$ for pose graph optimization.