IKeypointSelector

The IKeypointSelector interface is responsible for selecting keypoints from stereo frames for tracking and mapping. It can utilize depth, flow, and uncertainty information to make intelligent selections.

Interface

class IKeypointSelector(ABC, ConfigTestableSubclass):
    @abstractmethod
    def select_point(
        self,
        frame: StereoData,
        numPoint: int,
        depth0_est: IStereoDepth.Output,
        depth1_est: IStereoDepth.Output,
        match_est: IMatcher.Output | None,
    ) -> torch.Tensor: ...

Output Format

Returns a torch.Tensor of shape (N, 2) containing selected keypoint coordinates
Coordinates are in (u, v) format where:
- u: horizontal coordinate (x)
- v: vertical coordinate (y)
To access image values at keypoint locations: image[kp[..., 1], kp[..., 0]]

Methods to Implement

select_point(...) -> torch.Tensor
- Core method for keypoint selection
- Parameters:
  - frame: Current stereo frame
  - numPoint: Target number of keypoints (may not be strictly followed)
  - depth0_est: Depth estimation for frame 0
  - depth1_est: Depth estimation for frame 1
  - match_est: Optional flow estimation between frames
- Returns tensor of selected keypoint coordinates

Implementations

Base Selectors

RandomSelector
- Uniformly random selection within valid image region
- Configuration:
  - mask_width: Border width to exclude
  - device: Target device ("cuda" or "cpu")
GradientSelector
- Selects points with high image gradient
- Uses Sobel filter for gradient computation
- Configuration:
  - mask_width: Border width to exclude
  - grad_std: Gradient threshold multiplier
SparseGradientSelector
- Similar to GradientSelector but ensures spatial distribution
- Applies non-maximum suppression (NMS) to enforce sparsity
- Configuration:
  - mask_width: Border width to exclude
  - grad_std: Gradient threshold multiplier
  - nms_size: Size of NMS kernel (must be odd)
GridSelector
- Deterministic uniform grid-based selection
- Used for benchmarking and reproducible results
- Configuration:
  - mask_width: Border width to exclude
  - device: Target device ("cuda" or "cpu")

Advanced Selectors

CovAwareSelector
- Main keypoint selector used in MAC-VO
- Selects points based on depth, depth uncertainty, and flow uncertainty
- Implements selection strategy from MAC-VO paper Section III.B
- Configuration:
  - device: Target device ("cuda" or "cpu")
  - mask_width: Border width to exclude
  - max_depth: Maximum valid depth ("auto" or positive float)
  - kernel_size: NMS kernel size (must be odd)
  - max_depth_cov: Maximum depth uncertainty
  - max_match_cov: Maximum flow uncertainty
CovAwareSelector_NoDepth
- Modified version of CovAwareSelector without depth constraints
- Uses only flow uncertainty for selection
- Falls back to GridSelector if no flow uncertainty available
- Configuration:
  - device: Target device ("cuda" or "cpu")
  - mask_width: Border width to exclude
  - kernel_size: NMS kernel size (must be odd)
  - max_match_cov: Maximum flow uncertainty

Meta Selectors

SelectorCompose
- Combines multiple selectors with weighted distribution
- Configuration:
  - selector_args: List of selector configurations
  - weight: List of weights for each selector

Usage in MAC-VO

The IKeypointSelector interface is used in two main contexts:

Tracking: Selecting points for frame-to-frame tracking

kp0_uv = keypoint_selector.select_point(frame0.stereo, num_points, depth0, depth1, match01)

Mapping: Selecting points for map building

map_points = map_selector.select_point(frame.stereo, num_points, depth_est, depth_est, None)

Selection Process

Base selectors use simple strategies (random, gradient, grid)
Advanced selectors consider:
- Image borders (mask_width)
- Depth constraints (max_depth)
- Uncertainty thresholds (max_depth_cov, max_match_cov)
- Spatial distribution (NMS)
Meta selectors combine multiple strategies

info

The numPoint hint may not be followed strictly by the selector. Number of keypoint will fluctuate based on different selection strategy and the input conditions.

warning

Keypoints in this codebase are always arranged in (u, v) format. This means that you need to output the index of keypoints in different coordinate system as pytorch. Use image[kp[..., 1], kp[..., 0]] to read value of image on all u-v coords of keypoints.

Interface​

Output Format​

Methods to Implement​

Implementations​

Base Selectors​

Advanced Selectors​

Meta Selectors​

Usage in MAC-VO​

Selection Process​