ggml_ot.data.from_synth

Contents

ggml_ot.data.from_synth#

ggml_ot.data.from_synth(distribution_size=100, class_means=[5, 10, 15], offsets=[1.5, 4.5, 7.5, 10.5, 13.5, 16.5, 19.5, 22.5, 25.5, 28.5], shared_means_x=[0, 40], shared_means_y=[0, 50], varying_size=False, noise_scale=10, noise_dims=1, show=None, save=None, return_generating_mode=False, t=4)[source]#

Generates distributions, labels and weights from synthetic data.

Parameters:
distribution_size int

Number of points per generating mode in each distribution.

class_means list

Mean values for each class-specific Gaussian.

offsets list

Offset values creating multiple distributions per class.

shared_means_x list

X-coordinates of shared noise modes.

shared_means_y list

Y-coordinates of shared noise modes.

varying_size bool

If True, randomize distribution sizes.

noise_scale float

Scale factor for noise dimensions.

noise_dims int

Number of noise dimensions.

show bool or None

Whether to display the plot. None (default) automatically shows in interactive environments (notebooks, IPython) and suppresses in scripts. True/False override explicitly.

save str, bool, or None

Whether to save the figure to disk. None/False skip saving. True saves under the default name into settings.figdir. A str is used as the filename.

return_generating_mode bool

If True, return a 5th element with per-point generating mode indices. Mode 0 = class-specific Gaussian, Mode 1+ = shared modes.

Return type:

TripletDataset

Returns:

distributionslist[np.ndarray]

List of point arrays, each shape (n_points, 1 + noise_dims).

distributions_labelslist[int]

Class label for each distribution.

distributions_nrlist[int]

Globally unique distribution ID.

weightsNone

Placeholder for distribution weights (always None).

distributions_generating_modelist[np.ndarray], optional

Per-point generating mode indices (only if return_generating_mode=True).