IKeypointSelector
The IKeypointSelector
interface is responsible for selecting keypoints from stereo frames for tracking and mapping. It can utilize depth, flow, and uncertainty information to make intelligent selections.
Interface
class IKeypointSelector(ABC, ConfigTestableSubclass):
@abstractmethod
def select_point(
self,
frame: StereoData,
numPoint: int,
depth0_est: IStereoDepth.Output,
depth1_est: IStereoDepth.Output,
match_est: IMatcher.Output | None,
) -> torch.Tensor: ...
Output Format
- Returns a torch.Tensor of shape (N, 2) containing selected keypoint coordinates
- Coordinates are in (u, v) format where:
- u: horizontal coordinate (x)
- v: vertical coordinate (y)
- To access image values at keypoint locations:
image[kp[..., 1], kp[..., 0]]
Methods to Implement
select_point(...) -> torch.Tensor
- Core method for keypoint selection
- Parameters:
frame
: Current stereo framenumPoint
: Target number of keypoints (may not be strictly followed)depth0_est
: Depth estimation for frame 0depth1_est
: Depth estimation for frame 1match_est
: Optional flow estimation between frames
- Returns tensor of selected keypoint coordinates
Implementations
Base Selectors
-
RandomSelector
- Uniformly random selection within valid image region
- Configuration:
mask_width
: Border width to excludedevice
: Target device ("cuda" or "cpu")
-
GradientSelector
- Selects points with high image gradient
- Uses Sobel filter for gradient computation
- Configuration:
mask_width
: Border width to excludegrad_std
: Gradient threshold multiplier
-
SparseGradientSelector
- Similar to GradientSelector but ensures spatial distribution
- Applies non-maximum suppression (NMS) to enforce sparsity
- Configuration:
mask_width
: Border width to excludegrad_std
: Gradient threshold multipliernms_size
: Size of NMS kernel (must be odd)
-
GridSelector
- Deterministic uniform grid-based selection
- Used for benchmarking and reproducible results
- Configuration:
mask_width
: Border width to excludedevice
: Target device ("cuda" or "cpu")
Advanced Selectors
-
CovAwareSelector
- Main keypoint selector used in MAC-VO
- Selects points based on depth, depth uncertainty, and flow uncertainty
- Implements selection strategy from MAC-VO paper Section III.B
- Configuration:
device
: Target device ("cuda" or "cpu")mask_width
: Border width to excludemax_depth
: Maximum valid depth ("auto" or positive float)kernel_size
: NMS kernel size (must be odd)max_depth_cov
: Maximum depth uncertaintymax_match_cov
: Maximum flow uncertainty
-
CovAwareSelector_NoDepth
- Modified version of CovAwareSelector without depth constraints
- Uses only flow uncertainty for selection
- Falls back to GridSelector if no flow uncertainty available
- Configuration:
device
: Target device ("cuda" or "cpu")mask_width
: Border width to excludekernel_size
: NMS kernel size (must be odd)max_match_cov
: Maximum flow uncertainty
Meta Selectors
SelectorCompose
- Combines multiple selectors with weighted distribution
- Configuration:
selector_args
: List of selector configurationsweight
: List of weights for each selector
Usage in MAC-VO
The IKeypointSelector interface is used in two main contexts:
-
Tracking: Selecting points for frame-to-frame tracking
kp0_uv = keypoint_selector.select_point(frame0.stereo, num_points, depth0, depth1, match01)
-
Mapping: Selecting points for map building
map_points = map_selector.select_point(frame.stereo, num_points, depth_est, depth_est, None)
Selection Process
- Base selectors use simple strategies (random, gradient, grid)
- Advanced selectors consider:
- Image borders (mask_width)
- Depth constraints (max_depth)
- Uncertainty thresholds (max_depth_cov, max_match_cov)
- Spatial distribution (NMS)
- Meta selectors combine multiple strategies
The numPoint
hint may not be followed strictly by the selector. Number of keypoint will fluctuate based on different selection strategy and the input conditions.
Keypoints in this codebase are always arranged in (u, v) format. This means that you need to output the index of keypoints in different coordinate system as pytorch. Use image[kp[..., 1], kp[..., 0]]
to read value of image on all u-v coords of keypoints.