Skip to main content

IStereoDepth

The IStereoDepth interface is responsible for estimating dense depth maps from stereo image pairs, with optional uncertainty estimation and occlusion mask prediction.

Interface

class IStereoDepth(ABC, Generic[T_Context], ConfigTestableSubclass):
@property
@abstractmethod
def provide_cov(self) -> bool: ...

@abstractmethod
def init_context(self) -> T_Context: ...

@abstractmethod
def estimate(self, frame: StereoData) -> IStereoDepth.Output: ...

Output Structure

The interface defines an Output dataclass with the following fields:

  • depth: torch.Tensor (B×1×H×W) - Estimated depth map
  • disparity: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity map (if available)
  • disparity_uncertainty: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity uncertainty (if available)
  • cov: Optional[torch.Tensor] (B×1×H×W) - Estimated depth covariance (if provided)
  • mask: Optional[torch.Tensor] (B×1×H×W) - Boolean mask indicating valid prediction regions

Methods to Implement

  • provide_cov -> bool

    • Property indicating whether the implementation provides depth uncertainty estimation
    • Must return True if the implementation outputs depth covariance
  • init_context() -> T_Context

    • Initializes model-specific context (e.g., neural networks, parameters)
    • Called during initialization
    • Access configuration via self.config
  • estimate(frame: StereoData) -> IStereoDepth.Output

    • Core method for depth estimation
    • Input frame contains stereo image pair (imageL, imageR) of shape B×3×H×W
    • Returns IStereoDepth.Output with depth and optional disparity/covariance/mask
    • May pad outputs with nan if prediction shape differs from input

Implementations

Base Models

  • GTDepth

    • Returns ground truth depth from dataset
    • Raises AssertionError if ground truth not available
    • Does not provide covariance estimation
  • FlowFormerDepth

    • Uses vanilla FlowFormer for disparity estimation
    • Converts disparity to depth using depth = (baseline * fx) / disparity
    • Does not provide covariance estimation
    • Configuration:
      • weight: Path to model weights
      • device: Target device ("cuda" or "cpu")
  • FlowFormerCovDepth

    • Modified FlowFormer with joint disparity and uncertainty estimation
    • Converts disparity and uncertainty to depth and depth uncertainty
    • Provides covariance estimation
    • Same configuration as FlowFormerDepth
  • TartanVODepth

    • Uses TartanVO's StereoNet for depth estimation
    • Optional covariance estimation based on config
    • Configuration:
      • weight: Path to model weights
      • device: Target device ("cuda" or "cpu")
      • cov_mode: Covariance estimation mode ("Est" or "None")

Modifiers

  • ApplyGTDepthCov
    • Higher-order module that wraps another IStereoDepth
    • Compares estimated depth with ground truth to generate covariance
    • Requires ground truth depth data
    • Configuration:
      • module: Configuration for wrapped depth estimator

Usage in MAC-VO

The IStereoDepth interface is primarily used in:

  1. Frontend processing for visual odometry
  2. Depth evaluation and benchmarking
  3. Visualization and debugging

Example usage:

depth_estimator = IStereoDepth.instantiate(config.depth.type, config.depth.args)
depth_output = depth_estimator.estimate(frame.stereo)

# Access results
depth = depth_output.depth # B×1×H×W tensor
cov = depth_output.cov # B×1×H×W tensor or None
mask = depth_output.mask # B×1×H×W tensor or None
disparity = depth_output.disparity # B×1×H×W tensor or None

Utility Functions

The module provides utility functions for disparity-depth conversions:

  • disparity_to_depth(disp, bl, fx) -> depth

    • Converts disparity to depth using depth = (bl * fx) / disparity
  • disparity_to_depth_cov(disp, disp_cov, bl, fx) -> depth_cov

    • Propagates disparity covariance to depth covariance
    • Uses first-order Taylor approximation (see MAC-VO paper Appendix A.1)