Light Mode

MAC-VO: Metrics-Aware Covariance for Learning-based Stereo Visual Odometry

ICRA 2025 Best Conference Paper Award
ICRA 2025 Best Paper Award on Robot Perception
Yuheng Qiu*1,  Yutian Chen*1,   Zihao Zhang2,  Wenshan Wang1,  Sebastian Scherer1

*Equal Contribution

1Carnegie Mellon University

2Shanghai Jiao Tong University

Abstract

We propose MAC-VO, a novel learning-based stereo VO that leverages the learned metrics-aware matching uncertainty for dual purposes: selecting keypoint and weighing the residual in pose graph optimization. Compared to traditional geometric methods prioritizing texture-affluent features like edges, our keypoint selector employs the learned uncertainty to filter out the low-quality features based on global inconsistency. In contrast to the learning-based algorithms that rely on the scale-agnostic weight matrix, we design a metrics-aware spatial covariance model to capture the spatial information during keypoint registration. Integrating this covariance model into pose graph optimization enhances the robustness and reliability of pose estimation, particularly in challenging environments with varying illumination, feature density, and motion patterns. On public benchmark datasets, MAC-VO outperforms existing VO algorithms, even some SLAM algorithms in challenging environments. The covariance-aware framework also provides valuable information about the reliability of the estimated poses, which can benefit decision-making for autonomous systems.

MAC-VO at ICRA 2025

ICRA Registration Lobby

Main Floor Dynamic Scene

Presentation Room

MAC-VO Dense Mapping

By incorporating our uncertainty estimates, we can reliably select feature points for dense mapping without bundle adjustment / multi-frame optimization. The video below shows the dense mapping result on EuRoC, VBR, TartanAir, and TartanAir v2. No post-processing is applied.

Zed X Fire Academy 2

Zed X Fire Academy 1

AirLab Office

AirLab Workbench

Methods

System Pipeline

Figure 1. MAC-VO System pipeline. First, we use a shared matching network to estimate the depth, flow, and corresponding uncertainty. Secondly, we employ the learned uncertainty to filter out unreliable features. Lastly, we optimize the pose with the metrics-aware covariance model.

Metrics-Aware Spatial Covariance

Figure 2. a) Depth uncertainty estimated with the presence of matching uncertainty. b) Projecting depth and matching uncertainty on sensor plane to 3D space. c) Residual $\mathcal{L}_i$ for pose graph optimization.