dirac.main¶
- class sodirac.main.integrate_app(save_path: str = './Results/', subgraph: bool = True, use_gpu: bool = True, **kwargs)[source]¶
Bases:
objectHigh-level API for multi-omics graph integration.
This class prepares data (optionally with subgraph sampling), builds an integration model, trains it in an unsupervised manner, and returns embeddings/reconstructions.
- __init__(save_path: str = './Results/', subgraph: bool = True, use_gpu: bool = True, **kwargs) None[source]¶
Initialize the integration app.
- Parameters:
save_path (str, default './Results/') – Directory to write outputs (figures, checkpoints, etc.). Must be writable.
subgraph (bool, default True) – If
True, useClusterData/ClusterLoaderfor sampling. IfFalse, use a full-batchDataLoaderfor small graphs.use_gpu (bool, default True) – If
True, selectscudawhen available; otherwise CPU.**kwargs (Any) – Ignored; forwarded to
super.Effects (Side) –
------------ –
self.device (Sets) –
self.subgraph –
self.save_path. (and) –
- _get_data(dataset_list: list, edge_index, domain_list=None, batch=None, num_parts: int = 10, num_workers: int = 1, batch_size: int = 1)[source]¶
Process multi-omics node features and construct a graph dataset.
- Parameters:
dataset_list (list of (ndarray | torch.Tensor)) – List of feature matrices, one per modality/layer. Each element must be shaped
(n_nodes, n_features_i)(rows = nodes, cols = features).edge_index (torch.LongTensor) – Graph connectivity in COO format with shape
(2, E). Will be made undirected viato_undirected.domain_list (list[np.ndarray] | None, optional) – Optional per-modality integer domain labels of length
n_nodes. IfNone, each dataset is treated as its own domain (0..n-1).batch (None | pandas.Series | np.ndarray | list, optional) – Optional per-node batch labels of length
n_nodes. Non-numeric labels are categorical-encoded. IfNone, a zero vector is used for each modality.num_parts (int, default 10) – Number of partitions for
ClusterDatawhenself.subgraph=True.num_workers (int, default 1) – Number of workers for the loaders.
batch_size (int, default 1) – Batch size for
ClusterLoaderwhenself.subgraph=True.
- Returns:
A dictionary with the following keys: -
graph_ds: dictUnderlying graph data object/dict from
GraphDatasetwith additional modality tensors (e.g.,data_1,domain_1,batch_1…).graph_dlClusterLoader | DataLoaderA
ClusterLoaderifself.subgraph=True; otherwise a full-batchDataLoaderwith a single item.
n_samplesintNumber of input datasets/modalities.
n_inputs_listlist[int]Feature dimensions for each dataset
[n_features_0, n_features_1, ...].
n_domainsintNumber of unique domains inferred from
domain_list.
- Return type:
- Raises:
ValueError – If node counts differ across
dataset_list; ifbatchlength mismatches data; or an unsupportedbatchtype is provided.
Notes
Sets
self.n_samples,self.n_inputs_list, andself.num_domains. Prints the number of unique domains detected.
- _get_model(samples, n_hiddens: int = 128, n_outputs: int = 64, opt_GNN='GAT', dropout_rate=0.1, use_skip_connections=True, use_attention=True, n_attention_heads=4, use_layer_scale=False, layer_scale_init=0.01, use_stochastic_depth=False, stochastic_depth_rate=0.1, combine_method='concat')[source]¶
Build the integration model with the provided hyperparameters.
- Parameters:
samples (dict) – Output from
_get_data. Must containn_inputs_listandn_domains.n_hiddens (int, default 128) – Hidden dimension for GNN layers.
n_outputs (int, default 64) – Output/embedding dimension per node.
opt_GNN (str, default 'GAT') – GNN backbone option consumed by
integrate_model.dropout_rate (float, default 0.1) – Dropout rate inside the model.
use_skip_connections (bool, default True) – Whether to enable residual/skip connections (if supported).
use_attention (bool, default True) – Whether to use attention (if supported by the chosen backbone).
n_attention_heads (int, default 4) – Number of attention heads (if applicable).
use_layer_scale (bool, default False) – If
True, enable layer scale with initializationlayer_scale_init.layer_scale_init (float, default 1e-2) – Initialization value for layer scaling.
use_stochastic_depth (bool, default False) – Enable stochastic depth.
stochastic_depth_rate (float, default 0.1) – Drop probability for stochastic depth.
combine_method ({'concat','sum','attention'}, default 'concat') – How to combine multi-modal features inside the model.
- Returns:
models – The model instance returned by
integrate_model(...), ready for training.- Return type:
Any
- _train_dirac_integrate(samples, models, epochs: int = 500, optimizer_name: str = 'adam', lr: float = 0.001, tau: float = 0.9, wd: float = 0.05, scheduler: bool = True, lamb: float = 0.0005, scale_loss: float = 0.025)[source]¶
Train the integration model and evaluate embeddings/reconstructions.
- Parameters:
samples (dict) – Output from
_get_datawith keys likegraph_ds,graph_dl,n_inputs_list,n_domains.models (Any) – Model returned by
_get_model/integrate_model.epochs (int, default 500) – Training epochs.
optimizer_name (str, default 'adam') – Optimizer identifier consumed by the trainer.
lr (float, default 1e-3) – Learning rate.
tau (float, default 0.9) – Momentum/EMA or contrastive temperature parameter (per trainer definition).
wd (float, default 5e-2) – Weight decay.
scheduler (bool, default True) – Whether to use a learning-rate scheduler.
lamb (float, default 5e-4) – Loss coefficient used by the trainer.
scale_loss (float, default 0.025) – Additional loss scaling used by the trainer.
- Returns:
data_z (torch.Tensor) – Node embeddings; typically shaped
(n_nodes, n_outputs).combine_recon (Any) – Reconstruction(s) as returned by
train_integrate.evaluate; may be a tensor or a structure of tensors.
- class sodirac.main.annotate_app(save_path: str = './Results/', subgraph: bool = True, use_gpu: bool = True, **kwargs)[source]¶
Bases:
integrate_appHigh-level API for annotation / domain adaptation on graphs.
Prepares labeled source (and unlabeled target) graphs, builds an annotation model, supports semi-supervised training, optional novel-class discovery, and evaluation on source/target/test.
- _get_data(source_data, source_label, source_edge_index, target_data, target_edge_index, source_domain=None, target_domain=None, test_data=None, test_edge_index=None, weighted_classes=False, split_list=None, num_workers: int = 1, batch_size: int = 1, num_parts_source: int = 1, num_parts_target: int = 1)[source]¶
Process labeled source and (optional) unlabeled target into loaders.
- Parameters:
source_data ((ndarray | torch.Tensor)) – Source node features with shape
(n_source_nodes, n_features).source_label ((array-like)) – Source labels; numeric or categorical. Non-numeric labels are encoded to 0-based integer codes. A mapping is stored in
self.pairs.source_edge_index (torch.LongTensor) – COO connectivity for the source graph, shape
(2, E_source); made undirected.target_data ((ndarray | torch.Tensor) or None) – Optional target node features with shape
(n_target_nodes, n_features).target_edge_index (torch.LongTensor or None) – Optional COO connectivity for target graph, shape
(2, E_target); made undirected if provided.source_domain (array-like[int] or None, default None) – Optional per-node domain labels for source. Defaults to zeros.
target_domain (array-like[int] or None, default None) – Optional per-node domain labels for target. Defaults to ones when
target_datais provided.test_data ((ndarray | torch.Tensor) or None, default None) – Optional test node features
(n_test_nodes, n_features).test_edge_index (torch.LongTensor or None, default None) – Required if
test_datais provided.weighted_classes (bool, default False) – If
True, compute inverse-frequency class weights for source labels.split_list (list[tuple[int,int]] or None, default None) – Optional feature splits for multi-modal inputs, e.g.,
[(0,1000),(1000,1500)].num_workers (int, default 1) – DataLoader workers for source/target loaders.
batch_size (int, default 1) – Batch size for
ClusterLoader.num_parts_source (int, default 1) –
ClusterDatapartitions for source graph.num_parts_target (int, default 1) –
ClusterDatapartitions for target graph.
- Returns:
Contains: -
source_graph_ds: dictGraph data object/dict for source (from
GraphDataset_unpaired).source_graph_dlClusterLoaderLoader over source clusters.
target_graph_dsdict | NoneGraph data for target or
Noneif no target.
target_graph_dlClusterLoader | NoneLoader for target or
Noneif no target.
test_graph_dstorch_geometric.data.Data | NoneTest graph object if both
test_dataandtest_edge_indexprovided.
class_weighttorch.FloatTensor | NoneClass weights when
weighted_classes=True.
n_labelsintNumber of unique labels in source.
n_inputsintFeature dimension.
n_domainsintNumber of domains inferred from
source_domain/target_domain.
split_listlist[tuple[int,int]] | NoneEcho of the provided
split_list.
- Return type:
Notes
If
source_labelis categorical,self.pairsstores a mapping{code: original_label}; otherwiseself.pairsisNone. Setsself.n_labels,self.n_inputs, andself.n_domains. Prints the number of unique domains.
- _get_model(samples, n_hiddens: int = 128, n_outputs: int = 64, opt_GNN: str = 'SAGE', s: int = 32, m: float = 0.1, easy_margin: bool = False, dropout_rate: float = 0.1, use_skip_connections: bool = False, use_attention: bool = True, n_attention_heads: int = 2, use_layer_scale: bool = False, layer_scale_init: float = 0.01, use_stochastic_depth: bool = False, stochastic_depth_rate: float = 0.1, combine_method: str = 'concat')[source]¶
Build the annotation model (classifier/domain-adaptation).
- Parameters:
samples (dict) – Output from
annotate_app._get_data; must includen_domains,n_labels, and eithern_inputs(int) orsplit_listfor multi-modal cases.n_hiddens (int, default 128) – Hidden dimension.
n_outputs (int, default 64) – Embedding dimension before the classification head.
opt_GNN (str, default 'SAGE') – GNN backbone identifier consumed by
annotate_model.s (int, default 32) – Scale parameter for margin-based head (if applicable).
m (float, default 0.10) – Margin parameter for margin-based head.
easy_margin (bool, default False) – Use easy margin variant if supported.
dropout_rate (float, default 0.1) – Dropout rate.
use_skip_connections (bool, default False) – Enable skip/residual connections (if supported).
use_attention (bool, default True) – Enable attention (if supported).
n_attention_heads (int, default 2) – Number of attention heads when applicable.
use_layer_scale (bool, default False) – Enable layer scaling.
layer_scale_init (float, default 1e-2) – Initial value for layer scale.
use_stochastic_depth (bool, default False) – Enable stochastic depth.
stochastic_depth_rate (float, default 0.1) – Drop probability for stochastic depth.
combine_method ({'concat','sum','attention'}, default 'concat') – Feature fusion strategy for multi-modal inputs.
- Returns:
models – Model instance returned by
annotate_model(...).- Return type:
Any
- _train_dirac_annotate(samples, models, n_epochs: int = 200, optimizer_name: str = 'adam', lr: float = 0.001, wd: float = 0.005, scheduler: bool = True, filter_low_confidence: bool = True, confidence_threshold: float = 0.5)[source]¶
Train the annotation model (semi-supervised/domain adaptation) and evaluate.
- Parameters:
samples (dict) – Output from
_get_data. Expected keys includesource_graph_ds,source_graph_dl, optionaltarget_graph_dlandtest_graph_ds, and possiblyclass_weight.models (Any) – Model returned by
_get_model/annotate_model.n_epochs (int, default 200) – Number of training epochs.
optimizer_name (str, default 'adam') – Optimizer identifier.
lr (float, default 1e-3) – Learning rate.
wd (float, default 5e-3) – Weight decay.
scheduler (bool, default True) – Whether to enable learning-rate scheduling.
filter_low_confidence (bool, default True) – If
True, mark predictions with confidence <confidence_thresholdas"unassigned"in the returnedtarget_pred_filtered/test_pred_filtered.confidence_threshold (float, default 0.5) – Confidence threshold in [0, 1].
- Returns:
With keys (some may be
Noneif target/test are absent):source_feat,target_feat,target_output,target_prob,target_pred,target_pred_filtered,target_confs,target_mean_uncert,test_feat,test_output,test_prob,test_pred,test_pred_filtered,test_confs,test_mean_uncert,pairs,pairs_filter, andlow_confidence_threshold.- Return type:
- _train_dirac_novel(samples, minemodel, num_novel_class: int = 3, pre_epochs: int = 100, n_epochs: int = 200, num_parts: int = 30, resolution: float = 1, s: int = 64, m: float = 0.1, weights: dict = {'alpha1': 1, 'alpha2': 1, 'alpha3': 1, 'alpha4': 1, 'alpha5': 1, 'alpha6': 1, 'alpha7': 1, 'alpha8': 1})[source]¶
Discover novel target classes and retrain with expanded label space.
- Parameters:
samples (dict) – Output from
_get_data; must include keyssource_graph_ds,source_graph_dl,target_graph_ds,target_graph_dl,class_weight(optional),n_labels, and feature sizesn_inputs.minemodel (Any) – Initial annotation model (from
_get_model).num_novel_class (int, default 3) – Number of novel classes to discover in target.
pre_epochs (int, default 100) – Supervised pretraining epochs on source.
n_epochs (int, default 200) – Training epochs for the novel-phase.
num_parts (int, default 30) – Number of partitions for the (new) target
ClusterData.resolution (float, default 1) – Louvain resolution for clustering.
s (int, default 64) – Scale parameter for the (re)built model head.
m (float, default 0.1) – Margin parameter for the (re)built model head.
weights (dict, default {"alpha1":1, ..., "alpha8":1}) – Loss weights dictionary consumed by
_train_novel.
- Returns:
With keys:
source_feat,target_feat,target_output,target_prob,target_pred,target_confs,target_mean_uncert,test_feat,test_pred. (test_*may beNoneif a test set is not provided.)- Return type: