Dataset classes: .data#

The ggml_ot.data module provides classes and functions for creating, handling, and processing datasets for ggml-ot. These classes can be created from both AnnData objects and synthetic data and are compatible with PyTorch. The module also provides methods to split datasets, train and test models and download datasets.

TripletDataset#

data.AnnData_TripletDataset

Dataset to train GGML based on AnnData.

data.TripletDataset

Dataset to train GGML based on array data.

CELLxGENE interface#

data.load_cellxgene

Loads and caches Anndata object from CELLxGENE.

Generate synthetic data#

data.from_synth

Generates distributions, labels and weights from synthetic data.

data.from_synth_gmm

Create a GGML dataset from the synthetic GMM generator.