ggml_ot.data.from_synth_gmm#
- ggml_ot.data.from_synth_gmm(*, representation='cells', adata=False, gmm_key=None, t=4, n_dim=10, n_patients=6, n_samples=250, signal_mass_ratio=0.2, n_modes=10, signal_means_offset=12.0, signal_means_jitter=0.75, noise_means_offset=3.0, noise_means_jitter=0.75, noise_subspace_rank=2, signal_weight_concentration=None, noise_weight_concentration=None, signal_mean_shift=1.0, signal_cov_scale=1.2, signal_anisotropy=12.0, cov_rotation_jitter=10.0, cov_scale_jitter=0.15, global_rotation=30.0, random_seed=42)[source]#
Create a GGML dataset from the synthetic GMM generator.
Wraps
synth_gmm()and returns a dataset that can be used directly with training and evaluation functions.- Parameters:
- representation
Literal['cells','gmm'] (default:'cells') How patient distributions are represented in the dataset.
"cells"(default) samplesn_samplescells per patient and stores them as empirical point clouds."gmm"stores the analytical per-patient GMM component parameters directly (means, covariances, weights).- adata
bool(default:False) If
True, wrap the dataset in anAnnData_TripletDatasetbacked by anAnnDataobject. Required forgmm_keyto have any effect.- gmm_key
Optional[str] (default:None) When
adata=True, persist the analytical raw-space ground-truth GMM underdataset.adata.uns[gmm_key]. Requiresadata=True.- t
int(default:4) Number of triplets sampled per anchor distribution.
- **kwargs
All remaining keyword arguments (
n_dim,n_patients,n_samples, etc.) are forwarded tosynth_gmm(). See its documentation for details.
- representation
- Return type:
- Returns:
TripletDataset | AnnData_TripletDataset A dataset ready for use with
ggml_ot.train()orggml_ot.train_gmm().- Raises:
ValueError – If
representationis not"cells"or"gmm", or ifgmm_keyis set withoutadata=True.