IStereoDepth

The IStereoDepth interface is responsible for estimating dense depth maps from stereo image pairs, with optional uncertainty estimation and occlusion mask prediction.

Interface

class IStereoDepth(ABC, Generic[T_Context], ConfigTestableSubclass):
    @property
    @abstractmethod
    def provide_cov(self) -> bool: ...
    
    @abstractmethod
    def init_context(self) -> T_Context: ...
    
    @abstractmethod
    def estimate(self, frame: StereoData) -> IStereoDepth.Output: ...

Output Structure

The interface defines an Output dataclass with the following fields:

depth: torch.Tensor (B×1×H×W) - Estimated depth map
disparity: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity map (if available)
disparity_uncertainty: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity uncertainty (if available)
cov: Optional[torch.Tensor] (B×1×H×W) - Estimated depth covariance (if provided)
mask: Optional[torch.Tensor] (B×1×H×W) - Boolean mask indicating valid prediction regions

Methods to Implement

provide_cov -> bool
- Property indicating whether the implementation provides depth uncertainty estimation
- Must return True if the implementation outputs depth covariance
init_context() -> T_Context
- Initializes model-specific context (e.g., neural networks, parameters)
- Called during initialization
- Access configuration via self.config
estimate(frame: StereoData) -> IStereoDepth.Output
- Core method for depth estimation
- Input frame contains stereo image pair (imageL, imageR) of shape B×3×H×W
- Returns IStereoDepth.Output with depth and optional disparity/covariance/mask
- May pad outputs with nan if prediction shape differs from input

Implementations

Base Models

GTDepth
- Returns ground truth depth from dataset
- Raises AssertionError if ground truth not available
- Does not provide covariance estimation
FlowFormerDepth
- Uses vanilla FlowFormer for disparity estimation
- Converts disparity to depth using depth = (baseline * fx) / disparity
- Does not provide covariance estimation
- Configuration:
  - weight: Path to model weights
  - device: Target device ("cuda" or "cpu")
FlowFormerCovDepth
- Modified FlowFormer with joint disparity and uncertainty estimation
- Converts disparity and uncertainty to depth and depth uncertainty
- Provides covariance estimation
- Same configuration as FlowFormerDepth
TartanVODepth
- Uses TartanVO's StereoNet for depth estimation
- Optional covariance estimation based on config
- Configuration:
  - weight: Path to model weights
  - device: Target device ("cuda" or "cpu")
  - cov_mode: Covariance estimation mode ("Est" or "None")

Modifiers

ApplyGTDepthCov
- Higher-order module that wraps another IStereoDepth
- Compares estimated depth with ground truth to generate covariance
- Requires ground truth depth data
- Configuration:
  - module: Configuration for wrapped depth estimator

Usage in MAC-VO

The IStereoDepth interface is primarily used in:

Frontend processing for visual odometry
Depth evaluation and benchmarking
Visualization and debugging

Example usage:

depth_estimator = IStereoDepth.instantiate(config.depth.type, config.depth.args)
depth_output = depth_estimator.estimate(frame.stereo)

# Access results
depth = depth_output.depth          # B×1×H×W tensor
cov = depth_output.cov             # B×1×H×W tensor or None
mask = depth_output.mask           # B×1×H×W tensor or None
disparity = depth_output.disparity # B×1×H×W tensor or None

Utility Functions

The module provides utility functions for disparity-depth conversions:

disparity_to_depth(disp, bl, fx) -> depth
- Converts disparity to depth using depth = (bl * fx) / disparity
disparity_to_depth_cov(disp, disp_cov, bl, fx) -> depth_cov
- Propagates disparity covariance to depth covariance
- Uses first-order Taylor approximation (see MAC-VO paper Appendix A.1)

Interface​

Output Structure​

Methods to Implement​

Implementations​

Base Models​

Modifiers​

Usage in MAC-VO​

Utility Functions​