IFrontend
The IFrontend
interface provides a unified module for joint estimation of depth and optical flow from stereo image pairs, with optional uncertainty estimation for both tasks.
Why an additional layer of abstraction?
Sometime the depth estimation and matching are tightly coupled, so we need a way to combine them. For instance, if depth (using disparity) and matching uses the same network with same weight, instead of inference twice in sequential mannor, we can compose a batch with size of 2 and inference once.
How to use this?
-
If there's no specific need (e.g. for performance improvement mentioned above), just use the
FrontendCompose
to combine anIStereoDepth
and anIMatcher
. This should work just fine. -
Otherwise implement a new IFrontend and plug it in the pipeline.
Interface
class IFrontend(ABC, Generic[T_Context], ConfigTestableSubclass):
@property
@abstractmethod
def provide_cov(self) -> tuple[bool, bool]: ...
@abstractmethod
def init_context(self) -> T_Context: ...
@overload
@abstractmethod
def estimate(self, frame_t1: None, frame_t2: StereoData) -> tuple[IStereoDepth.Output, None]: ...
@overload
@abstractmethod
def estimate(self, frame_t1: StereoData, frame_t2: StereoData) -> tuple[IStereoDepth.Output, IMatcher.Output]: ...
Output Structure
The interface returns a tuple of outputs from both depth and flow estimation:
IStereoDepth.Output
: Depth estimation results- See IStereoDepth documentation for details
IMatcher.Output
or None: Flow estimation results (if frame_t1 is provided)- See IMatcher documentation for details
Methods to Implement
-
provide_cov -> tuple[bool, bool]
- Property indicating whether the implementation provides uncertainty estimation
- Returns (depth_cov_enabled, flow_cov_enabled)
- Must return
True
for each component if the implementation outputs its uncertainty
-
init_context() -> T_Context
- Initializes model-specific context (e.g., neural networks, parameters)
- Called during initialization
- Access configuration via
self.config
-
estimate(frame_t1: Optional[StereoData], frame_t2: StereoData) -> tuple[IStereoDepth.Output, Optional[IMatcher.Output]]
- Core method for joint depth and flow estimation
- Input frames contain stereo image pairs (imageL, imageR) of shape B×3×H×W
- If frame_t1 is None, only performs depth estimation
- Returns tuple of depth and optional flow outputs
- May pad outputs with
nan
if prediction shape differs from input
Implementations
Base Models
-
FrontendCompose
- Combines separate depth and flow estimators
- Uses individual IStereoDepth and IMatcher implementations
- Provides covariance if underlying implementations do
- Configuration:
depth
: Configuration for depth estimatortype
: IStereoDepth implementation class nameargs
: Arguments for depth estimator
match
: Configuration for flow estimatortype
: IMatcher implementation class nameargs
: Arguments for flow estimator
-
FlowFormerCovFrontend
- Main frontend used in MAC-VO for joint depth and flow estimation
- Uses FlowFormer network with covariance estimation
- Provides covariance for both depth and flow
- Configuration:
weight
: Path to model weightsdevice
: Target device ("cuda" or "cpu")dtype
: Model precision ("fp32", "bf16", or "fp16")enforce_positive_disparity
: Whether to enforce positive disparity valuesmax_flow
: Maximum allowed flow value (-1 for no limit)
Accelerated Models
CUDAGraph_FlowFormerCovFrontend
- Accelerated version of FlowFormerCovFrontend using CUDA graphs
- Improves inference speed by minimize kernal launching overhead.
- Only available on CUDA devices
- Same configuration as FlowFormerCovFrontend
- Additional optimizations:
- Uses tensor cores (TF32)
- Reduced precision matrix multiplication
- CUDA solver optimization
Usage in MAC-VO
The IFrontend interface is primarily used in:
- Visual odometry pipeline for joint depth and flow estimation
- Evaluation and benchmarking
- Visualization and debugging
Example usage:
frontend = IFrontend.instantiate(config.frontend.type, config.frontend.args)
# Depth estimation only
depth_output, _ = frontend.estimate(None, frame_t2)
# Joint depth and flow estimation
depth_output, flow_output = frontend.estimate(frame_t1, frame_t2)
# Access depth results
depth = depth_output.depth # B×1×H×W tensor
depth_cov = depth_output.cov # B×1×H×W tensor or None
depth_mask = depth_output.mask # B×1×H×W tensor or None
# Access flow results (if available)
if flow_output is not None:
flow = flow_output.flow # B×2×H×W tensor
flow_cov = flow_output.cov # B×3×H×W tensor or None
flow_mask = flow_output.mask # B×1×H×W tensor or None