Back to Whitepapers
Scientific White PaperPublication Grade

The "Sim-to-Real" Architecture for Predictive Oncology

Bridging the Valley of Death via Split-Source Transfer Learning

January 2026
The DNAI Research Team
Patent Application Filed
1

Executive Summary

The pharmaceutical industry currently faces a persistent translational challenge in oncology. Despite the availability of potent preclinical candidates, aggregate analyses estimate the overall probability of approval from first-in-human studies to be in the low single digits, implying an attrition rate typically exceeding 90% depending on the dataset and therapeutic class [1, 2].

This "Valley of Death" represents not merely a financial bottleneck; it is a failure of translation. The industry relies on the Patient-Derived Xenograft (PDX) as a primary efficacy filter, operating under the assumption that a tumor growing in a mouse is a valid proxy for a tumor growing in a human.

Our central thesis challenges the sufficiency of this assumption. We identify the "Stroma Replacement Phenomenon" as a critical source of biological noise that can corrupt standard predictive models. Research has demonstrated that following engraftment, human stromal components are rapidly replaced by murine host cells [3, 4].

DNAI introduces a novel "Sim-to-Real" architecture utilizing Split-Source Transfer Learning. By mathematically disentangling conserved intrinsic tumor drivers from species-specific stromal artifacts, we convert the mouse model from a "noisy screen" into a "calibrated simulator."

The Translation Gap

>90%
Attrition Rate

Oncology drugs fail in human trials

PDX
Gold Standard

Yet fundamentally flawed

P(Y|X)
Domain Shift

Conditional probabilities diverge

2

The Biological Barrier: Stroma Replacement

To build a valid predictive model, one must first characterize the noise in the source domain. Our analysis identifies the Tumor Microenvironment (TME) as a primary source of "Negative Transfer" between mice and humans.

2.1 The Chimeric Tumor Mechanism

A PDX tumor is not simply "human tissue in a mouse." It is a dynamic chimera.

The Replacement Kinetics

Upon engraftment, human stromal cells (fibroblasts, endothelial cells, and immune components) cannot survive without human-specific cytokines. They are rapidly replaced by murine host cells.

The Critical Timeline

Multiple studies indicate that stromal and vascular components supporting PDX tumors are predominantly murine by early passages; in some models, human stroma and vessels are undetectable even by the first passage [3, 4].

The Result

The tumor consists of Human Epithelial Drivers (the cancer) embedded in Murine Soil (the stroma).

2.2 The AI Failure Mode: Negative Transfer

This chimeric biology poses a significant challenge to standard Deep Learning models. When a neural network is trained on bulk RNA-sequencing data from these tumors, it indiscriminately learns correlations between gene expression and growth.

Crucially, the model may learn to predict tumor growth based on murine stromal signals (e.g., mouse Vegfa driving angiogenesis or mouse fibroblasts remodeling the matrix) [4]. When this model is transferred to a human patient, it searches for these murine signatures. Because the patient possesses human stroma (with different regulatory logic), the model fails to generalize.

Negative Transfer in Machine Learning Terms

The Source Domain (Ds) and Target Domain (Dt) have diverging conditional probabilities:

P(Y|X)mouse ≠ P(Y|X)human

A model that fails to explicitly disentangle these signals is destined for "Negative Transfer"—learning artifacts that actively degrade human prediction performance.

3

The DNAI Solution: Split-Source Transfer Learning

To address this, DNAI utilizes a "Split-Source" Neuro-Symbolic Architecture. Rather than training a single "Black Box" model, we mathematically separate biological signals into Conserved (Invariant) and Species-Specific (Private) components.

The architecture comprises four interdependent modules:

A

The Intrinsic Growth Engine

Source: Mouse

Objective

To learn the "pure physics" of tumor proliferation uncorrupted by immune interference.

Mechanism

We utilize a Neural Ordinary Differential Equation (Neural ODE). Unlike standard RNNs, Neural ODEs learn the continuous function of growth.

dN/dt = fθ(N, t)
Predicts ρ (proliferation rate) and K (carrying capacity)

In other embodiments, the growth function follows Gompertz or von Bertalanffy forms to accommodate diverse tumor kinetics.

B

The Immune Interaction Engine

Source: Human

Objective

To solve for the "Missing Variable"—the immune clearance coefficient (ω).

The Solution

Since Module A already provides the intrinsic growth parameters (ρ, K), Module B reduces the problem to a Single-Parameter Estimation. It solves for the specific ω value required to explain the deviation between the predicted intrinsic growth (from Module A) and the observed patient outcome. This makes the inverse problem mathematically tractable even with minimal data points.

C

The "Sim-to-Real" Fusion Layer

Domain Separation Network

Objective: To map human patient data into the mouse-trained physics engine without triggering Negative Transfer.

Mechanism: We employ a Domain Separation Network (DSN) with Heterogeneous Domain Adaptation.

Shared Encoder

Extracts domain-invariant features (e.g., fundamental cell cycle drivers).

Private Encoder

Captures and discards domain-specific features (e.g., Murine Stromal signals).

Gradient Reversal Layer (GRL)

An adversarial discriminator forces the Shared Encoder to remove any information that distinguishes "Mouse" from "Human".

Ltotal = Ltask + λLdiff + γLrecon
D

Physics-Constrained Safety Layer

Deterministic Validation

Allometric Scaling

Biological time scales with mass. We apply a common translational modeling convention, adjusting the time axis using quarter-power allometry (t ∝ M1/4) as a prior [5], which is then calibrated on target-domain data via the learnable parameter τ.

MaxBioLimit

We enforce a maximum biological growth rate derived from the minimum plausible doubling time (Tmin), implemented as:

ρmax = ln(2) / Tmin

Rejection Option: If a trajectory violates physiological plausibility constraints (e.g., Negative Volume, super-physiological growth), the system aborts the prediction.

Hybrid Engine: Path B (The Translator)

The Split-Source architecture forms Path B of DNAI's Hybrid Engine—the "Translator" pathway optimized for cross-species robustness. This complements Path A (the "Specialist") which is optimized for human-only accuracy.

Path B: DSN Pipeline Specifications
InputPDX RNA-seq (201 genes)
Shared Encoder Output201d → z_shared
Private Encoder Output128d → z_private (discarded)
Imputed Latent281d (z_shared + z_meth + z_cnv)
OutputHuman-compatible latent for ODE
Validated Performance
C-index (Survival)0.687

Cross-species transfer to human outcomes

Trajectory R²0.91

PDX growth curve reconstruction

Domain Confusion~50%

z_shared is species-agnostic

Hybrid Engine Path Comparison
MetricPath A (Specialist)Path B (Translator)
Input SourceHuman Multi-OmicsPDX RNA-seq
Latent Dimension328d201d → 281d
C-index0.7040.687
Optimized ForAccuracyRobustness
Use CaseClinical decision supportDrug development / PDX translation
4

Regulatory & Commercial Strategy

DNAI is designed to align with the principles of the FDA's Model-Informed Drug Development (MIDD) initiative, which aims to facilitate the integration of quantitative models into regulatory decision-making [6].

4.1 The Regulatory Context: External Controls

FDA Precedent: Eflornithine

The use of external data to support efficacy claims has precedent in specific regulatory contexts. For example, the FDA's 2023 approval of Eflornithine for high-risk neuroblastoma relied on a single-arm study compared to an External Control Arm derived from a separate clinical trial (ANBL0032) [7, 8].

While regulatory decisions regarding external controls remain case-specific, this illustrates the agency's willingness to consider robust, externally derived control data in areas of high unmet need.

4.2 The "Bayesian Borrowing" Mechanism

DNAI enables a Hybrid Control Arm strategy. The Sim-to-Real engine generates a Bayesian Prior for the control group outcome. If the incoming human control data in a trial is consistent with the model's prior, the trial may "borrow" statistical strength from the model, potentially allowing for smaller control groups.

Validation

If the incoming human control data in a trial matches the model's prior (validating the transfer), the trial "borrows" statistical strength from the model.

Impact

This dramatically reduces the number of patients required for placebo/standard-of-care arms, accelerating recruitment and reducing trial costs, particularly in rare indications.

5

Limitations and Model Assumptions

Model Constraints

While the DNAI architecture offers significant advantages over standard transfer learning, it operates under specific assumptions:

1
Immune Parameter Estimation

The accurate estimation of the immune parameter (ω) is dependent on the quality and timing of clinical endpoints; extremely sparse or noisy RECIST data may limit identifiability.

2
Domain Adaptation Limits

While the Domain Separation Network significantly reduces species-specific noise, no domain adaptation method can guarantee the removal of all distributional shifts.

3
Growth Model Assumptions

The current instantiation assumes that tumor growth dynamics follow generalized ODE forms (e.g., Gompertz/Logistic); highly atypical growth patterns may require model recalibration.

References

[1] Wong, C. H., Siah, K. W., & Lo, A. W. (2019). Estimation of clinical trial success rates and related parameters. Biostatistics, 20(2), 273–286.

[2] Dowden, H., & Munro, J. (2019). Trends in clinical success rates and therapeutic focus. Nature Reviews Drug Discovery, 18(7), 495–496.

[3] Hylander, B. L., et al. (2013). Origin of the vasculature supporting growth of primary patient tumor xenografts. Journal of Translational Medicine, 11(1), 110.

[4] Schneeberger, V. E., et al. (2016). Quantitation of Murine Stroma and Selective Purification of the Human Tumor Component of Patient-Derived Xenografts. PLoS ONE, 11(9).

[5] West, G. B., Brown, J. & Enquist, B. J. (1997). A general model for the origin of allometric scaling laws in biology. Science, 276(5309), 122–126.

[6] U.S. Food and Drug Administration. Model-Informed Drug Development Paired Meeting Program. FDA.gov. (Accessed Jan 2026).

[7] U.S. Food and Drug Administration. (2023). FDA approves eflornithine for adult and pediatric patients with high-risk neuroblastoma. FDA.gov. (Accessed Jan 2026).

[8] Study ANBL0032 (NCT00026312): Dinutuximab, GM-CSF, IL-2, and Isotretinoin in Treating Patients With High-Risk Neuroblastoma. ClinicalTrials.gov. (Accessed Jan 2026).

Ready to see DNAI in action?

Schedule a demo with our team.