Technology Deep Dive

From Patient Data
to Digital Twin

Every cancer is unique. Our platform turns raw molecular and imaging data into a living simulation of that specific tumor — step by step, fully transparent.

DataVAEImagingDriversDynamicsTreatmentTwin
0
Patients trained on
0d
Latent dimensions
0
Pipeline models
0
Cancer types
Step 1
Input

The Patient's Molecular Profile

It begins with data — the kind generated by modern sequencing labs. Gene expression (which genes are turned on and how loudly), DNA mutations (which genes are broken), copy number variation (which genes have been duplicated or deleted), and methylation (which genes have been chemically silenced). For a single patient, this amounts to roughly 6,000 individual measurements across four data types.

This is too much for any human to synthesize. It's also too noisy and high-dimensional for most AI systems to handle well. So the first thing we do is compress it.

2,579
genes
RNA-seq
Gene expression levels
500
genes
DNA Mutations
Which genes are altered
1,886
genes
Copy Number
Amplifications & deletions
1,000
probes
Methylation
Epigenetic silencing
Step 2
Foundation

Compressing Biology into a Fingerprint

Our foundation model — a hierarchical Bayesian VAE trained on 9,415 patients across 33 cancer types — takes all four data streams and distills them into a structured biological fingerprint of 328 numbers. But these aren't arbitrary numbers. Each one has a specific biological meaning.

One dimension captures how fast the tumor is growing, correlated at 0.96 with standard proliferation markers. Two hundred dimensions encode activity across 50 known cancer pathways — inflammation, immune response, DNA repair, metabolic reprogramming — with four dimensions per pathway. Forty-eight dimensions capture epigenetic patterns. Thirty-two encode chromosomal instability.

The key property: similar tumors end up near each other in this space, regardless of superficial differences. Two patients with completely different mutation profiles but the same underlying disease mechanism will occupy neighboring positions. And if a hospital only has RNA sequencing and nothing else, the model still produces a reliable fingerprint with appropriately wider uncertainty. Foundation model distillation captures biological signals beyond known pathways, and a regional methylation density decoder (R²=0.762) captures epigenetic patterns more faithfully than individual probe reconstruction.

This 328-dimensional fingerprint is the foundation everything else builds on.

Latent Space Structure

Proliferation
Growth speed (Ki67 r=0.96)
1d
Pathway Activity
50 Hallmark pathways x 4 dims
200d
Biological Context
Tumor context (proliferation-free)
31d
Residual Signal
Non-pathway biology
16d
Epigenetics
Methylation patterns
48d
Chromosomal Structure
Copy number instability
32d
Total: 328 dimensions
Step 3
Multimodal Fusion

Adding Imaging: Eyes on the Tumor

Molecular data tells us what's happening inside cells. But pathology slides and radiology scans reveal something sequencing cannot — the tumor's physical architecture. How immune cells surround or infiltrate the tumor mass. Whether it's invading blood vessels. How different subpopulations are spatially arranged.

We integrate histopathology (whole-slide images) and radiology (CT/MRI) through late gated fusion. Rather than forcing imaging through the same encoder as molecular data — which degrades both signals — each is processed by a specialist model, then a learned gate decides how much to trust each source for each individual patient.

We're transparent about the limits. Histopathology embeddings encode scanner and staining protocols, so cross-institution transfer is unreliable. Our production system suppresses imaging from unvalidated sites by default.

77%
Molecular
Average gate weight
23%
Histopathology
Average gate weight
+0.139
Radiology
C-index improvement
Step 4 — Dual Analysis

Two Paths, One Patient

The biological fingerprint splits into two complementary paradigms — the static path's driver and drug sensitivity analysis feeds into the dynamic path's tumor simulation

Static Path

What's Driving This Cancer

Driver identification matches patient mutations against 633 known drivers from IntOGen and 95 COSMIC Cancer Gene Census genes, then determines which are actively driving THIS patient's cancer using pathway context and expression evidence. Drug sensitivity prediction shows which pathways mediate the response — so clinicians can evaluate whether the recommendation makes biological sense.

Dynamic Path

How Will This Tumor Evolve

Real tumor subpopulations are identified from sequencing data through clonal deconvolution (not abstract clones — real subpopulations derived from variant allele frequencies). A Resistance Sentinel preserves minor resistant subclones that would otherwise be lost. Each clone is annotated with its driver mutations and knowledge-grounded drug sensitivity. A hypernetwork generates personalized physics parameters, a neural ODE simulates treatment response, and a hybrid stochastic simulator auto-switches between continuous SDE math and exact Gillespie SSA when clone populations are small — producing distributions of possible evolutionary outcomes including resistance emergence timing and clone fate probabilities.

Cross-Species Translation

Translating Mouse Data to Human Predictions

Our domain separation network strips mouse-specific artifacts from preclinical data, retaining only tumor biology that transfers to humans. It fills in missing data types (like methylation) by learning statistical relationships, allowing drug responses observed in mice to directly inform patient predictions.

Explore DSN
Step 5
Pathway Analysis

Understanding WHY: The Mechanistic Evidence Engine

Knowing which genes are mutated isn't enough — we need to know which biological pathways those mutations are actually activating. A KRAS mutation only matters if the downstream MAPK signaling cascade is actually firing. Our Mechanistic Evidence Engine runs parallel to the VAE, analyzing raw gene expression to determine exactly which pathways are driving the cancer.

Pathway Activity Scoring — 169 Pathways from 3 Databases

Integrates 50 MSigDB Hallmark + 68 KEGG cancer/signaling + 51 Reactome pathways, with robust scaling against a reference of 9,415 patients. Determines which biological programs are actively signaling — not just expressed. Validated: KRAS signaling is significantly higher in KRAS-mutant patients (p=8.5×10⁻²⁹).

Causal Signal Tracing

Starting from each mutated driver, the engine traces downstream through 1,743 directed causal edges (SIGNOR database) to map the full signaling cascade: KRAS → RAF → MEK → ERK. Each node in the chain is checked for druggability — identifying exactly where to intervene.

Drug Matching — 130 Variant-Drug Associations, 114 Drugs

Active pathways and druggable nodes are matched to 130 curated variant-drug associations covering 75 genes and 114 drugs from OncoKB and CIViC evidence. Ranked by evidence tier (Level 1 = FDA-approved). Known resistance mutations automatically override sensitivity predictions. The engine abstains when evidence is insufficient.

Step 6
Actionable Insights

The Treatment Design Layer

Beyond predicting what will happen, the platform helps identify what to do about it. Six specialized modules work as an additive layer on top of the core pipeline, with knowledge-grounded drug sensitivity from OncoKB and CIViC databases. Treatment labels extracted for 9,415 patients enable causal treatment effect estimation across regimens.

The Result
Complete System

A Digital Twin

What emerges is not a single prediction but a comprehensive computational model of an individual patient's cancer. It knows which mutations are driving the disease, which pathways are active, how the tumor microenvironment is configured, how fast it's growing, how it will respond to specific treatments over time, where resistance is likely to emerge, and which therapeutic vulnerabilities it has created for itself.

Every prediction decomposes into an inspectable chain of biologically named computations. From raw gene expression through 328 named latent dimensions, through physics parameters with physiological units, to time-resolved trajectories with calibrated uncertainty — nothing is opaque.

This is what we mean by a digital twin. Not a metaphor. A simulation.

Built for Trust, Not Hype

We publish our metrics honestly — including where models fail. Treatment optimization runs in shadow mode until externally validated. Cross-site imaging is suppressed by default. Every prediction carries calibrated uncertainty and the system abstains rather than guessing when evidence is insufficient. ISS-driven expert routing performs intelligent data quality assessment before generating predictions. Shift-aware conformal prediction provides honest uncertainty bounds under distribution shift, and GroupDRO training ensures robustness across hospitals by default.

Want to see it in action?

Explore a demo patient through the full pipeline, or get in touch to discuss a validation partnership.