Benchmark Results

Validated on held-out data

Every metric is computed on data the model never saw during training. We report both successes and limitations.

Request Demo

0.74

Uncertainty-Gated C-index

Green-tier external (N=229)

0.91

PDX Trajectory R²

Model vs. Biology

0.96

Proliferation r

vs MKI67

<5ms

Inference

Per patient

0.839

Pediatric Brain

RNA + Meth fusion

0.860

Sarcoma

Green-tier, full omics

0.718

CPTAC External

10 hospitals, DRO

0.729

Glioma (CGGA)

485 held-out, truly external

0.649

Pediatric (OpenPedCan)

2,697 patients, truly external

239K+

Patients Validated

9 external cohorts

Validation Atlas

External Validation by Cohort

Performance depends on which modalities are available and what cancer type is being predicted. We present all results — including where the model correctly identifies it doesn't have enough data.

The biggest lesson: more modalities = better predictions. RNA + Methylation fusion (C=0.839) outperforms DNA-only (C=0.551) by 52%. Brain tumors with methylation data score highest. Breast cancer without histopathology correctly returns chance-level — the model knows what it needs.

CPTAC — Adult Solid Tumors

1,031 patients · 10 cancer types · 10 hospital sites · RNA-seq + WSI

0.718

DRO C-index (TSS)

0.744

Green-tier (22%)

0.739

E5 TCGA-val pooled

0.737

GBM (best type)

Three DRO checkpoints: TSS-based (C=0.718, CPTAC truly external), E5 DeepHit (C=0.739, 4-env), and E8 (C=0.741 TCGA-val, 5-env with CGGA). GBM and PDAC benefit most from multi-omics.

OpenPedCan — Pediatric Brain Tumors

943 patients · 43 Heidelberg subclasses · RNA-seq + Methylation

0.839

Fusion C-index

0.829

Meth F1 (class.)

0.824

Meth C (survival)

0.809

RNA C (survival)

Strongest external result. Methylation dominates classification (+11.6pp), both modalities contribute to prognosis, proper fusion wins. Age matters: infants favor RNA, adolescents favor methylation.

CGGA — Glioma

485 held-out pts · Chinese cohort · RNA-seq only · Truly external

0.729

E8 pooled C-index

0.630

Within-cancer macro C

E8 5-env DRO (CGGA split: 485 train, 485 held-out). Within-cancer C improved from 0.548 (A1) to 0.630 (+0.082). GBM C=0.570, LGG C=0.690. Cross-ethnic generalization confirmed.

CT/MRI Radiology

132 pts · CT scans · Standalone module

+0.139

ΔC-index over omics

CT imaging adds significant signal beyond molecular data. Highest super-modality Shapley value (+0.293).

TCGA Sarcoma — Solid Tumor Validation

255 patients · Internal held-out · RNA-seq + DNA + CNV + Meth

0.860

Green-tier C-index

85.5%

Green coverage

+1,137d

Concordant survival

p=0.016

Concordance significance

Sarcoma patients whose treatment matched the model's recommendation survived 1,137 days longer than discordant patients. 85.5% qualify for high-confidence predictions. Full multi-omics profiling enables strong performance on rare solid tumors.

TARGET — Pediatric

2,929 pts · 6 types · RNA-seq

0.621

E5 DRO C-index

0.519

Without DRO

DRO recovers +10.2pp. Adult→pediatric transfer works partially. In E5 training pool.

MMRF — Multiple Myeloma

787 pts · Liquid tumor · RNA-seq

0.609

E5 DRO C-index

0.487

Without DRO

Liquid tumor — fundamentally different biology. E5 improves +12.2pp over baseline. In E5 training pool.

GENIE — Panel Sequencing

227,696 pts · 20 centers · DNA only

0.551

Mutation-only survival C

DNA mutations alone provide modest signal. Expression adds +0.019. Driver validation: ρ=0.452 vs IntOGen.

SCAN-B — Breast Cancer

3,069 pts · RNA-seq only · No histopathology

0.507

Effectively chance

Model correctly fails — breast cancer requires histopathology. This validates modality-awareness.

What We Learn: Modalities Matter More Than Algorithms

The single biggest driver of prediction quality is which molecular data types are available — not which model you use.

RNA + Meth
+ Fusion

C = 0.839

Pediatric brain

Full omics
(Green-tier)

C = 0.860

Sarcoma

RNA + WSI
+ DRO

C = 0.718

Adult solid (CPTAC)

RNA only

C = 0.729

Glioma (CGGA)

DNA only

C = 0.551

Panel seq (GENIE)

RNA only
(wrong type)

C = 0.507

Breast (SCAN-B)

Brain tumors score highest

Rich transcriptomic + epigenetic signal. GBM C=0.737, CGGA C=0.729, OpenPedCan C=0.839

DRO enables cross-site transfer

Every cohort improved with robust training. E8: CGGA within-cancer +8.2pp (0.548→0.630), MMRF +12.2pp, TARGET +10.2pp

Honest about limitations

Breast cancer needs histopathology. Liquid tumors transfer poorly. The model abstains when uncertain.

STATIC VALIDATION (TCGA)

Survival prediction

Stratified C-index: 0.670

Within-indication ranking accuracy on held-out TCGA data. Proves we rank patients within cancer types, not just between them.

Uncertainty-Gated C-index: 0.74

On Green-tier external CPTAC cohort (N=229) where ISS exceeds threshold. DRO-trained model on validated subset.

33 cancer types

Trained and validated across all major TCGA cohorts for pan-cancer applicability.

Benchmark vs. Standard of Care

C-index comparison on held-out TCGA pan-cancer cohort

Random Chance

0.50

Cox PH (SoC)

0.62

DNAI (Intent-to-Treat)

0.704

DNAI (Uncertainty-Gated)

0.74

★

+27% improvement over Cox Proportional Hazards on high-confidence predictions. DNAI's epistemic uncertainty calibration identifies when predictions are reliable.

Representation quality

Statistical Orthogonality: R² < 0.001

Proliferation and context subspaces are statistically independent, enabling clean biological interpretation.

Biological validity: r = 0.95

Proliferation latent correlates strongly with MKI67 expression, validating biological meaning.

Reconstruction: r > 0.85

High-fidelity reconstruction across all input modalities.

DYNAMIC VALIDATION (PDX)

Trajectory simulation

PDX Validation R²: 0.91

Model vs. Biology: Physics parameters learned from PDX (patient-derived xenograft) growth curves accurately predict real tumor dynamics.

Emulator Fidelity: 0.997

Math vs. Math: Learned trajectory emulator matches numerical ODE solver, enabling <5ms inference.

Speed: <5ms per trajectory

400-1000x faster than numerical solver, enabling real-time treatment optimization.

Note: We do not validate trajectories on TCGA (snapshot data) to avoid temporal paradoxes. PDX data provides true longitudinal measurements.

Known limitations

Training data scope

Phase 0 trained on TCGA (9,393 patients, 33 cancer types). Production DRO trained on 6 cohorts: TCGA+MMRF+TARGET+CPTAC+CGGA+OpenPedCan (~17,500 patients). Validated on 9 external cohorts including 2,697 pediatric patients. Truly external cohorts for E8: CGGA held-out (485 glioma, C=0.729) and SCAN-B (3,069 breast, C=0.504). Performance on rare cancers or non-standard sample preparation may vary.

Research use only

Not approved for clinical decision-making. Intended for research and pilot deployments.

External validation across multiple checkpoints

TSS-DRO: CPTAC C=0.718 (truly external, 1,031 patients, 10 cohorts). E8 DRO: CGGA C=0.729 (held-out, 485 glioma, within-cancer macro C=0.630). SCAN-B C=0.504 (breast, needs histopathology). MMRF C=0.609, TARGET C=0.621 (in training pool).

See it in action

Schedule a demo to explore DNAI with your own data.

Request Demo