IStereoDepth
The IStereoDepth
interface is responsible for estimating dense depth maps from stereo image pairs, with optional uncertainty estimation and occlusion mask prediction.
Interface
class IStereoDepth(ABC, Generic[T_Context], ConfigTestableSubclass):
@property
@abstractmethod
def provide_cov(self) -> bool: ...
@abstractmethod
def init_context(self) -> T_Context: ...
@abstractmethod
def estimate(self, frame: StereoData) -> IStereoDepth.Output: ...
Output Structure
The interface defines an Output
dataclass with the following fields:
depth
: torch.Tensor (B×1×H×W) - Estimated depth mapdisparity
: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity map (if available)disparity_uncertainty
: Optional[torch.Tensor] (B×1×H×W) - Estimated disparity uncertainty (if available)cov
: Optional[torch.Tensor] (B×1×H×W) - Estimated depth covariance (if provided)mask
: Optional[torch.Tensor] (B×1×H×W) - Boolean mask indicating valid prediction regions
Methods to Implement
-
provide_cov -> bool
- Property indicating whether the implementation provides depth uncertainty estimation
- Must return
True
if the implementation outputs depth covariance
-
init_context() -> T_Context
- Initializes model-specific context (e.g., neural networks, parameters)
- Called during initialization
- Access configuration via
self.config
-
estimate(frame: StereoData) -> IStereoDepth.Output
- Core method for depth estimation
- Input frame contains stereo image pair (imageL, imageR) of shape B×3×H×W
- Returns
IStereoDepth.Output
with depth and optional disparity/covariance/mask - May pad outputs with
nan
if prediction shape differs from input
Implementations
Base Models
-
GTDepth
- Returns ground truth depth from dataset
- Raises
AssertionError
if ground truth not available - Does not provide covariance estimation
-
FlowFormerDepth
- Uses vanilla FlowFormer for disparity estimation
- Converts disparity to depth using
depth = (baseline * fx) / disparity
- Does not provide covariance estimation
- Configuration:
weight
: Path to model weightsdevice
: Target device ("cuda" or "cpu")
-
FlowFormerCovDepth
- Modified FlowFormer with joint disparity and uncertainty estimation
- Converts disparity and uncertainty to depth and depth uncertainty
- Provides covariance estimation
- Same configuration as
FlowFormerDepth
-
TartanVODepth
- Uses TartanVO's StereoNet for depth estimation
- Optional covariance estimation based on config
- Configuration:
weight
: Path to model weightsdevice
: Target device ("cuda" or "cpu")cov_mode
: Covariance estimation mode ("Est" or "None")
Modifiers
ApplyGTDepthCov
- Higher-order module that wraps another IStereoDepth
- Compares estimated depth with ground truth to generate covariance
- Requires ground truth depth data
- Configuration:
module
: Configuration for wrapped depth estimator
Usage in MAC-VO
The IStereoDepth interface is primarily used in:
- Frontend processing for visual odometry
- Depth evaluation and benchmarking
- Visualization and debugging
Example usage:
depth_estimator = IStereoDepth.instantiate(config.depth.type, config.depth.args)
depth_output = depth_estimator.estimate(frame.stereo)
# Access results
depth = depth_output.depth # B×1×H×W tensor
cov = depth_output.cov # B×1×H×W tensor or None
mask = depth_output.mask # B×1×H×W tensor or None
disparity = depth_output.disparity # B×1×H×W tensor or None
Utility Functions
The module provides utility functions for disparity-depth conversions:
-
disparity_to_depth(disp, bl, fx) -> depth
- Converts disparity to depth using
depth = (bl * fx) / disparity
- Converts disparity to depth using
-
disparity_to_depth_cov(disp, disp_cov, bl, fx) -> depth_cov
- Propagates disparity covariance to depth covariance
- Uses first-order Taylor approximation (see MAC-VO paper Appendix A.1)