Benchmark Results

Validated on held-out data

Every metric is computed on data the model never saw during training. We report both successes and limitations.

Request Demo
0.74
Uncertainty-Gated C-index
Green-tier external (N=229)
0.91
PDX Trajectory R²
Model vs. Biology
0.96
Proliferation r
vs MKI67
<5ms
Inference
Per patient
0.839
Pediatric Brain
RNA + Meth fusion
0.860
Sarcoma
Green-tier, full omics
0.718
CPTAC External
10 hospitals, DRO
0.729
Glioma (CGGA)
485 held-out, truly external
0.649
Pediatric (OpenPedCan)
2,697 patients, truly external
239K+
Patients Validated
9 external cohorts
Validation Atlas

External Validation by Cohort

Performance depends on which modalities are available and what cancer type is being predicted. We present all results — including where the model correctly identifies it doesn't have enough data.

The biggest lesson: more modalities = better predictions. RNA + Methylation fusion (C=0.839) outperforms DNA-only (C=0.551) by 52%. Brain tumors with methylation data score highest. Breast cancer without histopathology correctly returns chance-level — the model knows what it needs.

CPTAC — Adult Solid Tumors

1,031 patients · 10 cancer types · 10 hospital sites · RNA-seq + WSI

0.718
DRO C-index (TSS)
0.744
Green-tier (22%)
0.739
E5 TCGA-val pooled
0.737
GBM (best type)

Three DRO checkpoints: TSS-based (C=0.718, CPTAC truly external), E5 DeepHit (C=0.739, 4-env), and E8 (C=0.741 TCGA-val, 5-env with CGGA). GBM and PDAC benefit most from multi-omics.

OpenPedCan — Pediatric Brain Tumors

943 patients · 43 Heidelberg subclasses · RNA-seq + Methylation

0.839
Fusion C-index
0.829
Meth F1 (class.)
0.824
Meth C (survival)
0.809
RNA C (survival)

Strongest external result. Methylation dominates classification (+11.6pp), both modalities contribute to prognosis, proper fusion wins. Age matters: infants favor RNA, adolescents favor methylation.

CGGA — Glioma

485 held-out pts · Chinese cohort · RNA-seq only · Truly external

0.729
E8 pooled C-index
0.630
Within-cancer macro C

E8 5-env DRO (CGGA split: 485 train, 485 held-out). Within-cancer C improved from 0.548 (A1) to 0.630 (+0.082). GBM C=0.570, LGG C=0.690. Cross-ethnic generalization confirmed.

CT/MRI Radiology

132 pts · CT scans · Standalone module

+0.139
ΔC-index over omics

CT imaging adds significant signal beyond molecular data. Highest super-modality Shapley value (+0.293).

TCGA Sarcoma — Solid Tumor Validation

255 patients · Internal held-out · RNA-seq + DNA + CNV + Meth

0.860
Green-tier C-index
85.5%
Green coverage
+1,137d
Concordant survival
p=0.016
Concordance significance

Sarcoma patients whose treatment matched the model's recommendation survived 1,137 days longer than discordant patients. 85.5% qualify for high-confidence predictions. Full multi-omics profiling enables strong performance on rare solid tumors.

TARGET — Pediatric

2,929 pts · 6 types · RNA-seq

0.621
E5 DRO C-index
0.519
Without DRO

DRO recovers +10.2pp. Adult→pediatric transfer works partially. In E5 training pool.

MMRF — Multiple Myeloma

787 pts · Liquid tumor · RNA-seq

0.609
E5 DRO C-index
0.487
Without DRO

Liquid tumor — fundamentally different biology. E5 improves +12.2pp over baseline. In E5 training pool.

GENIE — Panel Sequencing

227,696 pts · 20 centers · DNA only

0.551
Mutation-only survival C

DNA mutations alone provide modest signal. Expression adds +0.019. Driver validation: ρ=0.452 vs IntOGen.

SCAN-B — Breast Cancer

3,069 pts · RNA-seq only · No histopathology

0.507
Effectively chance

Model correctly fails — breast cancer requires histopathology. This validates modality-awareness.

What We Learn: Modalities Matter More Than Algorithms

The single biggest driver of prediction quality is which molecular data types are available — not which model you use.

RNA + Meth
+ Fusion
C = 0.839
Pediatric brain
Full omics
(Green-tier)
C = 0.860
Sarcoma
RNA + WSI
+ DRO
C = 0.718
Adult solid (CPTAC)
RNA only
C = 0.729
Glioma (CGGA)
DNA only
C = 0.551
Panel seq (GENIE)
RNA only
(wrong type)
C = 0.507
Breast (SCAN-B)
Brain tumors score highest
Rich transcriptomic + epigenetic signal. GBM C=0.737, CGGA C=0.729, OpenPedCan C=0.839
DRO enables cross-site transfer
Every cohort improved with robust training. E8: CGGA within-cancer +8.2pp (0.548→0.630), MMRF +12.2pp, TARGET +10.2pp
Honest about limitations
Breast cancer needs histopathology. Liquid tumors transfer poorly. The model abstains when uncertain.
STATIC VALIDATION (TCGA)

Survival prediction

Stratified C-index: 0.670

Within-indication ranking accuracy on held-out TCGA data. Proves we rank patients within cancer types, not just between them.

Uncertainty-Gated C-index: 0.74

On Green-tier external CPTAC cohort (N=229) where ISS exceeds threshold. DRO-trained model on validated subset.

33 cancer types

Trained and validated across all major TCGA cohorts for pan-cancer applicability.

Benchmark vs. Standard of Care

C-index comparison on held-out TCGA pan-cancer cohort

Random Chance
0.50
Cox PH (SoC)
0.62
DNAI (Intent-to-Treat)
0.704
DNAI (Uncertainty-Gated)
0.74

+27% improvement over Cox Proportional Hazards on high-confidence predictions. DNAI's epistemic uncertainty calibration identifies when predictions are reliable.

Representation quality

Statistical Orthogonality: R² < 0.001

Proliferation and context subspaces are statistically independent, enabling clean biological interpretation.

Biological validity: r = 0.95

Proliferation latent correlates strongly with MKI67 expression, validating biological meaning.

Reconstruction: r > 0.85

High-fidelity reconstruction across all input modalities.

DYNAMIC VALIDATION (PDX)

Trajectory simulation

PDX Validation R²: 0.91

Model vs. Biology: Physics parameters learned from PDX (patient-derived xenograft) growth curves accurately predict real tumor dynamics.

Emulator Fidelity: 0.997

Math vs. Math: Learned trajectory emulator matches numerical ODE solver, enabling <5ms inference.

Speed: <5ms per trajectory

400-1000x faster than numerical solver, enabling real-time treatment optimization.

Note: We do not validate trajectories on TCGA (snapshot data) to avoid temporal paradoxes. PDX data provides true longitudinal measurements.

Known limitations

Training data scope

Phase 0 trained on TCGA (9,393 patients, 33 cancer types). Production DRO trained on 6 cohorts: TCGA+MMRF+TARGET+CPTAC+CGGA+OpenPedCan (~17,500 patients). Validated on 9 external cohorts including 2,697 pediatric patients. Truly external cohorts for E8: CGGA held-out (485 glioma, C=0.729) and SCAN-B (3,069 breast, C=0.504). Performance on rare cancers or non-standard sample preparation may vary.

Research use only

Not approved for clinical decision-making. Intended for research and pilot deployments.

External validation across multiple checkpoints

TSS-DRO: CPTAC C=0.718 (truly external, 1,031 patients, 10 cohorts). E8 DRO: CGGA C=0.729 (held-out, 485 glioma, within-cancer macro C=0.630). SCAN-B C=0.504 (breast, needs histopathology). MMRF C=0.609, TARGET C=0.621 (in training pool).

See it in action

Schedule a demo to explore DNAI with your own data.

Request Demo