QuickStart¶
DIRAC provides horizontal (label transfer across datasets/platforms) and vertical (multi-omics integration) analysis.
This QuickStart focuses on two simulated datasets and walks you through the end-to-end workflow:
NSF — spatial multi-omics (RNA, ADT, optional ATAC) with joint embedding, clustering, ARI evaluation, and optional subgraph training.
scMultiSim — horizontal annotation (RNA→RNA, RNA+ATAC→RNA+ATAC), confidence-based novel type discovery, and UMAP mixing checks.
What you’ll learn¶
Preprocess single-cell/spatial data (
normalize_total → log1p → scale; optional PCA/LSI).Build spatial graphs (k-NN; optional radius; multi-batch vs single-sample).
Train DIRAC (
annotate_app/integrate_app) and write back embeddings/predictions.Evaluate results (Accuracy/Precision/Recall/F1, ARI) and visualize (spatial, UMAP).
Use confidence filtering to mark low-confidence predictions as
"unassigned"and catch missing/novel cell types.
Prerequisites¶
Python ≥ 3.9, and:
scanpy,anndata,numpy,pandas,matplotlib(andtorchinside DIRAC).Local clone of the DIRAC codebase (added to
sys.path).(Optional) R +
mclustfor MCLUST clustering.
Data¶
NSF (simulated spatial multi-omics): RNA/ADT (+ ATAC)
.h5adfiles.scMultiSim (simulated multi-omics, e.g. mask = 0.3 means ~30% random zeros in RNA/ATAC):
source_*andtarget_*.h5adfiles.
Note
Place datasets under DIRAC-main/data/ in folders referenced by the notebooks (see each notebook’s first cell for exact paths).
At a glance¶
Load reference/target AnnData.
Preprocess per modality.
Build spatial graphs.
Pack data with
_get_data(...)(optionallynum_parts_*for subgraphs).Build model with
_get_model(...)(e.g.,opt_GNN="SAGE"or"GAT").Train (
_train_dirac_integrateor_train_dirac_annotate).Evaluate + visualize; save
.h5ad/figures/metrics.
The following notebooks are included:
Notebook details¶
- notebooks/run-NSF.ipynb — Spatial multi-omics (NSF)
Load RNA and ADT (optionally ATAC).
Preprocess (HVGs/PCA for RNA; standardization for ADT; LSI optional for ATAC).
Build spatial k-NN graph (tune
n_neighbors; radius graph optional).Two-omics integration (RNA+ADT), then extend to three omics (add ATAC).
Optional subgraph training (
subgraph=True; controlnum_parts).Cluster with MCLUST or Leiden; compute ARI vs ground truth.
Plot spatial maps and UMAP; save embeddings and outputs.
- notebooks/run-scMultiSim.ipynb — Horizontal annotation (scMultiSim)
Single-modality (RNA→RNA) and dual-modality (RNA+ATAC→RNA+ATAC).
Preprocess each modality; concatenate features and specify
split_list.Build multi-batch graph for source (if needed) + single-sample graph for target.
Run
annotate_appto transfer labels; write back embeddings/predictions.Confidence filtering (e.g.,
confidence_threshold=0.9) to mark low-confidence cells as"unassigned"(novel/missing types).Optional mixing check: UMAP colored by Omics and cluster labels.
Report metrics (Accuracy/Precision/Recall/F1) and Unassigned Rate; save results (NPZ/JSON, figures,
.h5ad).
Tips & troubleshooting¶
Ensure required fields exist:
obs["cell.type"](labels),obsm["spatial"](coords), andobs["batch"]for multi-batch graphs.Verify
split_listaligns with feature concatenation ((0, dim_RNA),(dim_RNA, dim_RNA+dim_ATAC)).Start with
n_neighbors=8–12; adjust for platform resolution/spot density.Use subgraphs for large tissues (
num_parts_* = max(1, n//200)).Tune model defaults:
n_hiddens=128,n_outputs=64,epochs=200–400, balance vialamb/scale_loss.For portability, save arrays via NPZ and label maps via JSON.

