dirac.dataprep¶

class sodirac.dataprep.GraphDS(*args: Any, **kwargs: Any)[source]¶

Bases: Dataset

PyTorch Dataset for single-cell/spatial profiles with optional labels and domains.

Parameters:

counts (np.ndarray or sparse.csr_matrix) – Shape [cells, genes]. Expression/count matrix.
labels (np.ndarray or sparse.csr_matrix, optional) – Shape [cells,]. Integer cell-type labels.
domains (np.ndarray or sparse.csr_matrix, optional) – Shape [cells,]. Integer domain labels.
transform (Callable, optional) – Callable applied to each sample dict.
num_domains (int, optional) – Total number of domains for one-hot encoding of domains. Default: -1.

Return type:

None

Notes

Dense copies are created for input arrays when needed.
One-hot encodings are produced for labels/domains when provided.

__init__(counts: scipy.sparse.csr.csr_matrix | numpy.ndarray, labels: scipy.sparse.csr.csr_matrix | numpy.ndarray | None = None, domains: scipy.sparse.csr.csr_matrix | numpy.ndarray | None = None, transform: Callable | None = None, num_domains: int = -1) → None[source]¶

_process_labels(labels: numpy.ndarray | scipy.sparse.csr_matrix | None) → tuple[source]¶

Convert labels to torch tensors and one-hot encodings.

Parameters:: labels (np.ndarray or sparse.csr_matrix, optional) – Shape [cells,]. Integer labels.
Returns:: (labels_tensor, one_hot) – Dense label tensor and one-hot tensor, or (None, None) if labels is None.
Return type:: Tuple[torch.LongTensor or None, torch.FloatTensor or None]

Notes

One-hot dimension equals the number of unique labels in the batch.

_process_domains(domains: numpy.ndarray | scipy.sparse.csr_matrix | None, num_domains: int) → tuple[source]¶

Convert domain labels to torch tensors and one-hot encodings.

Parameters:

domains (np.ndarray or sparse.csr_matrix, optional) – Shape [cells,]. Integer domain labels.
num_domains (int) – Number of domain categories for one-hot encoding.

Returns:

(domains_tensor, one_hot) – Dense domain tensor and one-hot tensor, or (None, None) if domains is None.

Return type:

Tuple[torch.LongTensor or None, torch.FloatTensor or None]

sodirac.dataprep.balance_classes(y: numpy.ndarray, class_min: int = 256, random_state: int | None = None) → numpy.ndarray[source]¶

Balance class indices by undersampling majorities and oversampling minorities.

Parameters:

y (np.ndarray) – Shape [N,]. Class labels.
class_min (int, default 256) – Minimum examples per class after balancing.
random_state (int, optional) – Random seed for reproducibility.

Returns:

balanced_idx – Balanced indices (with replacement for minority classes).

Return type:

np.ndarray

Notes

The smallest effective class count used is max(min_count, class_min).

class sodirac.dataprep.GraphDataset(*args: Any, **kwargs: Any)[source]¶

Bases: InMemoryDataset

In-memory PyG dataset for a paired graph with features, batches, domains, and labels.

Parameters:

data (np.ndarray) – Shape [num_nodes, num_features]. Node features.
batch (np.ndarray) – Shape [num_nodes]. Batch assignment per node.
domain (np.ndarray) – Shape [num_nodes]. Domain labels per node.
edge_index (torch.Tensor) – Shape [2, num_edges]. Edge index.
label (np.ndarray, optional) – Shape [num_nodes]. Node labels. Default: None.
transform (callable, optional) – A callable that takes and returns a torch_geometric.data.Data object.

graph_data¶

Graph data object with fields: - data_0 (FloatTensor), batch_0 (LongTensor), domain_0 (LongTensor),

edge_index (Tensor), idx (LongTensor), label (LongTensor or None), num_nodes (int).

Type:: torch_geometric.data.Data

Notes

This dataset contains a single graph (length = 1).

__init__(data: numpy.ndarray, batch: numpy.ndarray, domain: numpy.ndarray, edge_index: torch.Tensor, label: numpy.ndarray | None = None, transform: Callable | None = None)[source]¶

class sodirac.dataprep.GraphDataset_unpaired(*args: Any, **kwargs: Any)[source]¶

Bases: InMemoryDataset

In-memory PyG dataset for an unpaired graph with features, domains, and labels.

Parameters:

data (np.ndarray) – Shape [num_nodes, num_features]. Node features.
domain (np.ndarray) – Shape [num_nodes]. Domain labels per node.
edge_index (torch.Tensor) – Shape [2, num_edges]. Edge index.
label (np.ndarray, optional) – Shape [num_nodes]. Node labels. Default: None.
transform (callable, optional) – A callable that takes and returns a torch_geometric.data.Data object.

graph_data¶

Graph data object with fields: - data (FloatTensor), domain (LongTensor),

edge_index (Tensor), idx (LongTensor), label (LongTensor or None), num_nodes (int).

Type:: torch_geometric.data.Data

Notes

This dataset contains a single graph (length = 1).

__init__(data: numpy.ndarray, domain: numpy.ndarray, edge_index: torch.Tensor, label: numpy.ndarray | None = None, transform: Callable | None = None)[source]¶