ggml_ot.from_numpy#
- ggml_ot.from_numpy(supports, distribution_labels, n_triplets=3, weights=None, covariances=None, identical_supports=False, **kwargs)[source]#
Dataset to train GGML based on array data.
This class stores a collection of distributions (“supports”) and produces triplets (i, j, k) of relative relationships where i and j are from the same class and k is from a different class. These triplets are used to train GGML such that distributions i and j are closer to each other than j and k by some margin alpha.
This class exposes the dataset to the standardized interfaces used by
ggml_ot.train(),ggml_ot.tune(),ggml_ot.test()andggml_ot.train_test().- Parameters:
- supports Sequence[np.ndarray]
Sequence of per-distribution supports. Each element is an array of points (for empirical distributions) or component means (for GMM-style representations).
- distribution_labels Sequence[int] | np.ndarray
Integer labels identifying the class/group of each distribution.
- n_triplets int, optional
Number of triplets to generate per “anchor” distribution (default: 3).
- weights Sequence[np.ndarray] | None, optional
Per-distribution probability weights (e.g., cluster proportions) or None for uniform weights (default: None).
- covariances Sequence[np.ndarray] | None, optional
Optional per-distribution covariance arrays when supports represent Gaussian mixture components (default: None).
- identical_supports bool, optional
If True, indicates that all distributions share the same supports (e.g., identical component locations across distributions). This changes the __getitem__ return format and allows faster OT evaluation (default: False).
- Return type:
Notes
The class generates triplets by sampling t “positive” neighbors from the same class and t “negative” neighbors from each different class for every distribution.