ggml_ot.from_numpy

Contents

ggml_ot.from_numpy#

ggml_ot.from_numpy(supports, distribution_labels, n_triplets=3, weights=None, covariances=None, identical_supports=False, **kwargs)[source]#

Dataset to train GGML based on array data.

This class stores a collection of distributions (“supports”) and produces triplets (i, j, k) of relative relationships where i and j are from the same class and k is from a different class. These triplets are used to train GGML such that distributions i and j are closer to each other than j and k by some margin alpha.

This class exposes the dataset to the standardized interfaces used by ggml_ot.train(), ggml_ot.tune(), ggml_ot.test() and ggml_ot.train_test().

Parameters:
supports Sequence[np.ndarray]

Sequence of per-distribution supports. Each element is an array of points (for empirical distributions) or component means (for GMM-style representations).

distribution_labels Sequence[int] | np.ndarray

Integer labels identifying the class/group of each distribution.

n_triplets int, optional

Number of triplets to generate per “anchor” distribution (default: 3).

weights Sequence[np.ndarray] | None, optional

Per-distribution probability weights (e.g., cluster proportions) or None for uniform weights (default: None).

covariances Sequence[np.ndarray] | None, optional

Optional per-distribution covariance arrays when supports represent Gaussian mixture components (default: None).

identical_supports bool, optional

If True, indicates that all distributions share the same supports (e.g., identical component locations across distributions). This changes the __getitem__ return format and allows faster OT evaluation (default: False).

Return type:

TripletDataset

Notes

  • The class generates triplets by sampling t “positive” neighbors from the same class and t “negative” neighbors from each different class for every distribution.