Technology Deep Dive

From Patient Data
to Digital Twin

Every cancer is unique. Our platform turns raw molecular and imaging data into a living simulation of that specific tumor — step by step, fully transparent.

DataVAEImagingDriversDynamicsTreatmentTwin

Patients trained on

Latent dimensions

Pipeline models

Cancer types

DNAI Digital Twin Overview

Slide deck (PDF) — visual walkthrough of the full platform

Fighting Cancer with Digital Tumor Twins

AI-generated podcast — high-level overview of our approach

Step 1

Input

The Patient's Molecular Profile

It begins with data — the kind generated by modern sequencing labs. Gene expression (which genes are turned on and how loudly), DNA mutations (which genes are broken), copy number variation (which genes have been duplicated or deleted), and methylation (which genes have been chemically silenced). For a single patient, this amounts to roughly 6,000 individual measurements across four data types.

This is too much for any human to synthesize. It's also too noisy and high-dimensional for most AI systems to handle well. So the first thing we do is compress it.

2,579

genes

RNA-seq

Gene expression levels

500

genes

DNA Mutations

Which genes are altered

1,886

genes

Copy Number

Amplifications & deletions

1,000

probes

Methylation

Epigenetic silencing

Step 2

Foundation

Compressing Biology into a Fingerprint

Our foundation model — a hierarchical Bayesian VAE trained on 9,415 patients across 33 cancer types — takes all four data streams and distills them into a structured biological fingerprint of 328 numbers. But these aren't arbitrary numbers. Each one has a specific biological meaning.

One dimension captures how fast the tumor is growing, correlated at 0.96 with standard proliferation markers. Two hundred dimensions encode activity across 50 known cancer pathways — inflammation, immune response, DNA repair, metabolic reprogramming — with four dimensions per pathway. Forty-eight dimensions capture epigenetic patterns. Thirty-two encode chromosomal instability.

The key property: similar tumors end up near each other in this space, regardless of superficial differences. Two patients with completely different mutation profiles but the same underlying disease mechanism will occupy neighboring positions. And if a hospital only has RNA sequencing and nothing else, the model still produces a reliable fingerprint with appropriately wider uncertainty. Foundation model distillation captures biological signals beyond known pathways, and a regional methylation density decoder (R²=0.762) captures epigenetic patterns more faithfully than individual probe reconstruction.

This 328-dimensional fingerprint is the foundation everything else builds on.

Latent Space Structure

Proliferation

Growth speed (Ki67 r=0.96)

Pathway Activity

50 Hallmark pathways x 4 dims

200d

Biological Context

Tumor context (proliferation-free)

31d

Residual Signal

Non-pathway biology

16d

Epigenetics

Methylation patterns

48d

Chromosomal Structure

Copy number instability

32d

Total: 328 dimensions

Explore VAE Model Card

Step 3

Multimodal Fusion

Adding Imaging: Eyes on the Tumor

Molecular data tells us what's happening inside cells. But pathology slides and radiology scans reveal something sequencing cannot — the tumor's physical architecture. How immune cells surround or infiltrate the tumor mass. Whether it's invading blood vessels. How different subpopulations are spatially arranged.

We integrate histopathology (whole-slide images) and radiology (CT/MRI) through late gated fusion. Rather than forcing imaging through the same encoder as molecular data — which degrades both signals — each is processed by a specialist model, then a learned gate decides how much to trust each source for each individual patient.

We're transparent about the limits. Histopathology embeddings encode scanner and staining protocols, so cross-institution transfer is unreliable. Our production system suppresses imaging from unvalidated sites by default.

77%

Molecular

Average gate weight

23%

Histopathology

Average gate weight

+0.139

Radiology

C-index improvement

Step 4 — Dual Analysis

Two Paths, One Patient

The biological fingerprint splits into two complementary paradigms — the static path's driver and drug sensitivity analysis feeds into the dynamic path's tumor simulation

Static Path

What's Driving This Cancer

Driver identification matches patient mutations against 633 known drivers from IntOGen and 95 COSMIC Cancer Gene Census genes, then determines which are actively driving THIS patient's cancer using pathway context and expression evidence. Drug sensitivity prediction shows which pathways mediate the response — so clinicians can evaluate whether the recommendation makes biological sense.

CausalDriver-GAT

AUROC 0.933 | AUPRC 0.990

TxResponse

50 interpretable pathway concepts

Dynamic Path

How Will This Tumor Evolve

Real tumor subpopulations are identified from sequencing data through clonal deconvolution (not abstract clones — real subpopulations derived from variant allele frequencies). A Resistance Sentinel preserves minor resistant subclones that would otherwise be lost. Each clone is annotated with its driver mutations and knowledge-grounded drug sensitivity. A hypernetwork generates personalized physics parameters, a neural ODE simulates treatment response, and a hybrid stochastic simulator auto-switches between continuous SDE math and exact Gillespie SSA when clone populations are small — producing distributions of possible evolutionary outcomes including resistance emergence timing and clone fate probabilities.

Hypernetwork v3.2

C-index 0.704 | Physics-constrained

Neural ODE

5ms inference | Lotka-Volterra dynamics

EvoSim

Stochastic clonal evolution | Phylo 0.89

Cross-Species Translation

Translating Mouse Data to Human Predictions

Our domain separation network strips mouse-specific artifacts from preclinical data, retaining only tumor biology that transfers to humans. It fills in missing data types (like methylation) by learning statistical relationships, allowing drug responses observed in mice to directly inform patient predictions.

Explore DSN

Step 5

Pathway Analysis

Understanding WHY: The Mechanistic Evidence Engine

Knowing which genes are mutated isn't enough — we need to know which biological pathways those mutations are actually activating. A KRAS mutation only matters if the downstream MAPK signaling cascade is actually firing. Our Mechanistic Evidence Engine runs parallel to the VAE, analyzing raw gene expression to determine exactly which pathways are driving the cancer.

Pathway Activity Scoring — 227 Pathways from 3 Databases + Cancer-Specific Expansion

Integrates 50 MSigDB Hallmark + 68 KEGG cancer/signaling + 51 Reactome pathways, plus 58 cancer-specific expansion pathways covering immune subprograms, stromal biology, treatment resistance, and drug mechanism analysis — with robust scaling against a reference of 9,415 patients. Determines which biological programs are actively signaling — not just expressed. Validated: KRAS signaling is significantly higher in KRAS-mutant patients (p=8.5×10⁻²⁹).

Causal Signal Tracing

Starting from each mutated driver, the engine traces downstream through 1,743 directed causal edges (SIGNOR database) to map the full signaling cascade: KRAS → RAF → MEK → ERK. Each node in the chain is checked for druggability — identifying exactly where to intervene.

Drug Matching — 130 Variant-Drug Associations, 114 Drugs

Active pathways and druggable nodes are matched to 130 curated variant-drug associations covering 75 genes and 114 drugs from OncoKB and CIViC evidence. Ranked by evidence tier (Level 1 = FDA-approved). Known resistance mutations automatically override sensitivity predictions. The engine abstains when evidence is insufficient.

Step 6

Actionable Insights

The Treatment Design Layer

Beyond predicting what will happen, the platform helps identify what to do about it. Six specialized modules work as an additive layer on top of the core pipeline, with knowledge-grounded drug sensitivity from OncoKB and CIViC databases. Treatment labels extracted for 9,415 patients enable causal treatment effect estimation across regimens.

Treatment Optimization

Shadow Mode

Counterfactual treatment ranking across 295 compounds. Pareto optimization across efficacy, toxicity, and resistance. Sequential 3-line planning with beam search.

GDSC ρ 0.727TARNet C 0.715

Combination Discovery

Zero-shot drug combination prediction via orthogonal clonal targeting. Validated on 1,209 drug pairs with leave-tissue-fold-out ρ=0.689. Schedule optimization achieves 42% dose reduction vs concurrent dosing.

ρ 0.8001,209 pairs validated

Synthetic Lethality

Combines 28 curated gene-drug pairs with a trained ML classifier (ρ=0.776) that predicts novel context-dependent vulnerabilities using pathway state, DepMap CRISPR essentiality (18,435 genes), and drug embeddings.

28 pairs + ML v237/37 tests

Immunogenic Variants

TME permissiveness scoring (HOT/WARM/COLD) determines immunotherapy feasibility. Ranks variants by clonality, expression, and type. Abstains for cold tumors.

11 hallmarks34/34 tests

Methylation Decoding

Reconstructs epigenetic landscape to detect silenced tumor suppressors. Screens 19 TSGs including MGMT (temozolomide response) and BRCA1. Silencing is reversible.

R² 0.762 regional19 TSGs

The Result

Complete System

A Digital Twin

What emerges is not a single prediction but a comprehensive computational model of an individual patient's cancer. It knows which mutations are driving the disease, which pathways are active, how the tumor microenvironment is configured, how fast it's growing, how it will respond to specific treatments over time, where resistance is likely to emerge, and which therapeutic vulnerabilities it has created for itself.

Every prediction decomposes into an inspectable chain of biologically named computations. From raw gene expression through 328 named latent dimensions, through physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.

This is what we mean by a digital twin. Not a metaphor. A simulation.

Built for Trust, Not Hype

We publish our metrics honestly — including where models fail. Treatment optimization runs in shadow mode until externally validated. Cross-site imaging is suppressed by default. Every prediction carries calibrated uncertainty and the system abstains rather than guessing when evidence is insufficient. ISS-driven expert routing performs intelligent data quality assessment before generating predictions. Shift-aware conformal prediction provides honest uncertainty bounds under distribution shift, and GroupDRO training ensures robustness across hospitals by default.

Prediction Traceability Validation Evidence All Model Cards

Want to see it in action?

Explore a demo patient through the full pipeline, or get in touch to discuss a validation partnership.

Try the Demo Contact Us

From Patient Datato Digital Twin

DNAI Digital Twin Overview

Fighting Cancer with Digital Tumor Twins

The Patient's Molecular Profile

Compressing Biology into a Fingerprint

Latent Space Structure

Adding Imaging: Eyes on the Tumor

Two Paths, One Patient

What's Driving This Cancer

How Will This Tumor Evolve

Translating Mouse Data to Human Predictions

Understanding WHY: The Mechanistic Evidence Engine

Pathway Activity Scoring — 227 Pathways from 3 Databases + Cancer-Specific Expansion

Causal Signal Tracing

Drug Matching — 130 Variant-Drug Associations, 114 Drugs

The Treatment Design Layer

Treatment Optimization

Combination Discovery

Synthetic Lethality

Immunogenic Variants

Methylation Decoding

A Digital Twin

Built for Trust, Not Hype

Want to see it in action?

From Patient Data
to Digital Twin