vismatch.utils

vismatch.utils.disable_xformers()[source][source]

Disable xformers in all loaded modules, so that models fall back to standard PyTorch attention.

This is needed on CPU because xformers only supports CUDA. Without this, models using DINOv2 (e.g. RoMa, DeDoDe-Kornia) crash on CPU when xformers happens to be installed.

vismatch.utils.get_image_pairs_paths(inputs)[source][source]

process input to produce a list of image pairs paths

Parameters:

inputs (list[Path] | Path) – input path, which could be one of: (1) two image paths (2) dir with two images (3) dir with dirs with image pairs (4) txt file with two image paths per line

Returns:

list of pairs of image paths

Return type:

list[tuple[Path, Path]]

vismatch.utils.to_numpy(x)[source][source]

convert item or container of items to numpy

Parameters:

x (torch.Tensor | np.ndarray | dict | list) – input

Returns:

numpy array of input

Return type:

np.ndarray

vismatch.utils.to_tensor(x, device=None)[source][source]

Convert to tensor and place on device

Parameters:
  • x (np.ndarray | torch.Tensor) – item to convert to tensor

  • device (str, optional) – device to place tensor on. Defaults to None.

Returns:

tensor with data from x on device device

Return type:

torch.Tensor

vismatch.utils.to_device(data, device='cuda')[source][source]

Recursively move tensors in nested data structures to device.

Parameters:
  • data (Tensor | dict | list)

  • device (str)

vismatch.utils.to_normalized_coords(pts, height, width)[source][source]

normalize kpt coords from px space to [0,1] Assumes pts are in x, y order in array/tensor shape (N, 2)

Parameters:
  • pts (np.ndarray | torch.Tensor) – array of kpts, must be shape (N, 2)

  • height (int) – height of img

  • width (int) – width of img

Returns:

kpts in normalized [0,1] coords

Return type:

np.array

vismatch.utils.to_px_coords(pts, height, width)[source][source]

unnormalized kpt coords from [0,1] to px space Assumes pts are in x, y order

Parameters:
  • pts (np.ndarray | torch.Tensor) – array of kpts, must be shape (N, 2)

  • height (int) – height of img

  • width (int) – width of img

Returns:

kpts in normalized [0,1] coords

Return type:

np.array

vismatch.utils.pad_images_to_same_shape(img0, img1)[source][source]

Pad two image tensors to the same spatial dimensions (right/bottom zero-padding).

Parameters:
  • img0 (Tensor)

  • img1 (Tensor)

Return type:

tuple[Tensor, Tensor]

vismatch.utils.resize_to_divisible(img, divisible_by=14)[source][source]

Resize to be divisible by a factor. Useful for ViT based models.

Parameters:
  • img (torch.Tensor) – img as tensor, in (*, H, W) order

  • divisible_by (int, optional) – factor to make sure img is divisible by. Defaults to 14.

Returns:

img tensor with divisible shape

Return type:

torch.Tensor

vismatch.utils.lower_config(yacs_cfg)[source][source]

Convert yacs config to lower-case dict recursively.

Parameters:

yacs_cfg (CfgNode)

Return type:

dict

vismatch.utils.load_module(module_name, module_path)[source][source]

Load module from module_path into the interpreter with the namespace given by module_name.

Note that module_path is usually the path to an __init__.py file.

Parameters:
  • module_name (str) – module name (will be used to import from later, as in from module_name import my_function)

  • module_path (Path | str) – path to module (usually an __init__.py file)

Return type:

None

vismatch.utils.add_to_path(path, **_kwargs)[source][source]

Add path to the front of sys.path, allowing imports from it.

Always inserts at position 0 so the most recently added directory wins. Auto-detects every package and module in path and, if any of them are already cached in sys.modules from a different vismatch third-party directory, flushes the stale entries so the next import resolves correctly. User code, stdlib, and pip packages are never touched.

Parameters:

path (str | Path)

Return type:

None

vismatch.utils.get_default_device()[source][source]

get best available device for torch: cuda, mps (mac), else cpu

Returns:

best available device as str

Return type:

str

vismatch.utils.flow_to_matches(flow, covisibility, num_samples=1000, min_confidence=0.0, method='probabilistic', rng=None)[source][source]

Convert a dense optical flow + covisibility map to sparse keypoint matches.

Parameters:
  • flow (np.ndarray) – shape (2, H, W) or (H, W, 2). Interpreted as (dx, dy) per pixel.

  • covisibility (np.ndarray) – shape (H, W) with confidence in [0, 1] (or any non-negative scores).

  • num_samples (int, optional) – max number of matches to return. Defaults to 1000.

  • min_confidence (float, optional) – ignore pixels with covisibility <= min_confidence. Defaults to 0.0.

  • method (str, optional) – sampling method, one of “probabilistic”, “topk”, or “grid”. Defaults to “probabilistic”.

  • rng (np.random.RandomState | np.random.Generator, optional) – for reproducibility. Defaults to None.

Returns:

(matches0, matches1, confidences) where:
  • matches0 (np.ndarray): (N, 2) source keypoints as (x, y) (float32)

  • matches1 (np.ndarray): (N, 2) target keypoints as (x, y) = source + flow (float32)

  • confidences (np.ndarray): (N,) covisibility/confidence values (float32)

Return type:

tuple

vismatch.utils.to_tensor_image(img)[source][source]