General Information
Conflict of Interest
Request Clinical Trials
Associated Trial(s): What type of data are you looking for?: Individual Participant-Level Data, which includes Full CSR and all supporting documentationRequest Clinical Trials
Data Request Status
Status: Approved Pending DUA SignatureResearch Proposal
Project Title: Dynamic overall survival prediction using early follow-up in a real-world metastatic prostate cancer registry
Scientific Abstract:
Background: Men with metastatic castration-resistant prostate cancer (mCRPC) have heterogeneous outcomes in routine practice. Prognosis may change after treatment starts as early clinical, laboratory and treatment-course information becomes available.
Objective: To assess whether dynamic overall survival prediction using early follow-up information improves prediction beyond baseline-only models.
Study Design: Methodological research using participant-level data from NCT02236637, a completed prospective observational registry. Landmark models will be developed at early follow-up times and evaluated against mature follow-up.
Participants: Men with confirmed adenocarcinoma of the prostate and mCRPC enrolled in NCT02236637. The analysis will include patients with baseline information, first-line treatment information and evaluable survival follow-up.
Primary and Secondary Outcome Measure(s): The primary outcome is overall survival from registry entry or first-line treatment initiation, depending on available dates. Secondary outcomes include conditional overall survival after landmark times, progression-free survival or time to progression if consistently defined, and model performance measures.
Statistical Analysis: Baseline-only and landmark prediction models will be compared. Predictors may include age, performance status, metastatic burden, treatment, PSA and laboratory values. Performance will be assessed using calibration, discrimination, Brier score and prediction error. Flexible survival models will be explored as prespecified secondary modelling strategies.
Brief Project Background and Statement of Project Significance:
Metastatic castration-resistant prostate cancer (mCRPC) is clinically heterogeneous. Patients differ in age, performance status, metastatic burden, symptoms, prior therapy, laboratory values and treatment sequence. Real-world registries are important because patients treated in routine practice may differ from those enrolled in randomized trials, and researchers often need to interpret outcomes while follow-up is incomplete.
NCT02236637 is a large prospective real-world registry of men with mCRPC. Published analyses from this registry have described treatment patterns and real-world outcomes among patients receiving first-line abiraterone, enzalutamide and docetaxel. These analyses demonstrate the value of the registry for understanding outcomes in routine care. However, an important question remains: can prognosis be updated more accurately when early follow-up information becomes available after treatment initiation?
Most prognostic analyses focus on baseline risk prediction. In routine oncology care and real-world evidence research, prognosis is dynamic. Early treatment course, PSA changes, symptoms, laboratory changes, treatment discontinuation and early progression indicators may alter expected survival. A baseline-only model may therefore be less informative than a model that updates risk after treatment has begun.
This project will evaluate dynamic overall survival prediction in this completed real-world mCRPC registry. We will define prespecified landmark times after registry entry or first-line treatment initiation. At each landmark, only information available up to that time will be used to predict subsequent overall survival. Predictions will be evaluated against mature observed follow-up. This design reflects a common practical problem in oncology research: how to make reliable prognostic assessments when follow-up is incomplete.
The project will enhance generalizable scientific and medical knowledge in three ways. First, it will clarify whether early follow-up information materially improves survival prediction beyond baseline characteristics. Second, it will compare transparent modelling strategies for dynamic survival prediction using routinely collected oncology data. Third, it will provide a reproducible framework for evaluating prediction under immature follow-up in observational oncology registries.
The study is scientific and non-commercial. It will not be used for litigation, commercial product development, marketing, regulatory submissions or purposes outside the approved research aims. It will not estimate causal comparative treatment effects between therapies. Treatment information will be used only for prediction, risk adjustment and descriptive characterization.
References are provided in the Bibliography section.
Specific Aims of the Project:
Aim 1: To develop and evaluate baseline-only models for overall survival prediction in men with mCRPC using routinely collected registry variables.
Hypothesis 1: Baseline clinical, disease burden, treatment and laboratory variables will provide useful overall survival prediction, but prediction error will remain clinically meaningful.
Aim 2: To develop landmark prediction models that update overall survival predictions using information available during early follow-up.
Hypothesis 2: Landmark models incorporating early follow-up information will improve calibration and prediction error compared with baseline-only models.
Aim 3: To compare prespecified modelling strategies for dynamic survival prediction, including Cox landmark models, flexible parametric survival models and Bayesian parametric or flexible survival models when feasible within the secure platform.
Hypothesis 3: Flexible or Bayesian models may improve calibration or uncertainty quantification in selected settings, but this will be evaluated empirically rather than assumed.
Aim 4: To describe model performance across clinically relevant subgroups defined by performance status, metastatic burden, baseline PSA/laboratory risk and first-line treatment category.
Hypothesis 4: Prediction performance will vary across clinically relevant subgroups, highlighting settings where dynamic prediction may be most informative.
Study Design: Methodological research
What is the purpose of the analysis being proposed? Please select all that apply.: Develop or refine statistical methods Research on clinical trial methods Research on clinical prediction or risk prediction
Software Used: R, RStudio, Open Office
Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study:
Data source: Participant-level data from NCT02236637, "A Prospective Registry of Patients With a Confirmed Diagnosis of Adenocarcinoma of the Prostate Presenting With Metastatic Castrate-Resistant Prostate Cancer."
Inclusion criteria:
1. Men enrolled in NCT02236637 with confirmed adenocarcinoma of the prostate and metastatic castration-resistant prostate cancer.
2. Patients with available registry entry date or first-line mCRPC treatment initiation date, depending on the available data structure.
3. Patients with evaluable overall survival follow-up, including survival time and event/censoring indicator.
4. Patients with sufficient baseline information for prediction, including age and at least one measure of disease status, treatment status or laboratory risk.
Exclusion criteria:
1. Patients without evaluable overall survival time or event/censoring status.
2. Patients with missing or inconsistent key dates or time variables that prevent construction of survival outcomes.
3. Patients who are not alive and under observation at a selected landmark time will be excluded from analyses at that landmark only.
4. For secondary progression-related analyses, patients without a consistently defined progression or progression-free survival variable will be excluded from those analyses only.
No external participant-level data will be pooled with the YODA data. All participant-level analyses will be conducted within the secure data sharing platform. Only aggregate results, model summaries, tables and figures permitted under the DUA will be exported.
Primary and Secondary Outcome Measure(s) and how they will be categorized/defined for your study:
Primary outcome:
Overall survival, defined as time from registry entry or first-line mCRPC treatment initiation to death from any cause. The exact time origin will be selected based on available dates and documentation and specified before analysis. Patients alive at last available follow-up will be censored at their last known follow-up time.
Secondary outcomes:
1. Conditional overall survival after prespecified landmark times, anticipated to include 3, 6 and 12 months after the selected time origin.
2. Progression-free survival or time to progression, only if the dataset contains a consistently defined progression endpoint.
3. Time to treatment discontinuation, if treatment start and stop dates are sufficiently complete.
4. Model performance outcomes, including calibration, discrimination, Brier score and prediction error at prespecified horizons.
The primary scientific endpoint is overall survival. Progression-related endpoints and treatment discontinuation will be secondary or exploratory if definitions differ across sites or if missingness is substantial.
No safety endpoint analysis is planned. Adverse event variables, if used, will be used only descriptively or as early follow-up predictors if clinically justified and consistently captured. Any unexpected or serious safety finding identified during the project will be reported promptly to the Data Partner through the appropriate YODA Project process.
Main Predictor/Independent Variable and how it will be categorized/defined for your study:
The main independent variable is prediction strategy, defined by the information set used for risk prediction.
The primary comparison will be:
1. Baseline-only prediction: models using variables available at registry entry or first-line mCRPC treatment initiation.
2. Dynamic landmark prediction: models using baseline variables plus information observed up to prespecified early follow-up landmark times.
Landmark times will be selected based on clinical relevance and data availability, anticipated to include 3, 6 and 12 months after the selected time origin. At each landmark, predictions will be made only among patients alive and under observation at that time.
Early follow-up predictors may include treatment status, treatment discontinuation, progression indicators, PSA values, laboratory values, symptoms, performance status and quality-of-life information, if available before or at the landmark. These predictors will be defined using information available up to each landmark to avoid use of future information.
Treatment variables will be used for prediction and risk adjustment only. The study will not estimate causal comparative treatment effectiveness, and treatment comparisons will not be interpreted as causal because treatment selection in this registry reflects routine clinical practice and is subject to confounding by indication.
Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study:
Variables of interest will be used to characterize the cohort, define prediction models and examine subgroup performance.
Baseline variables may include:
1. Age at registry entry or treatment initiation.
2. Country and site or center, if available.
3. ECOG performance status or other functional status measure.
4. Gleason score or grade group.
5. Time since prostate cancer diagnosis and time since mCRPC diagnosis, if available.
6. Metastatic burden, including bone, visceral and lymph node metastases.
7. Prior prostate cancer therapies.
8. First-line mCRPC treatment category, such as abiraterone, enzalutamide, docetaxel or other therapy.
9. Baseline PSA.
10. Baseline laboratory values, including hemoglobin, alkaline phosphatase, LDH, albumin, neutrophils, lymphocytes and platelets, if available.
11. Pain, symptoms and quality-of-life measures, if available.
Early follow-up variables may include treatment continuation or discontinuation, PSA change from baseline, laboratory change from baseline, progression status before landmark, updated symptoms, performance status or quality-of-life measures, if available and consistently captured.
Continuous variables will generally be modelled continuously using clinically interpretable transformations or restricted cubic splines when appropriate. Subgroup analyses will consider performance status, metastatic burden, baseline PSA/laboratory risk and first-line treatment category.
Statistical Analysis Plan:
Overview:
This study will evaluate dynamic overall survival prediction in a completed real-world mCRPC registry. The primary comparison will be baseline-only prediction versus landmark prediction using early follow-up information. The analysis will focus on prediction, calibration and model performance, not causal treatment-effect estimation.
Descriptive analysis:
We will summarize baseline characteristics, treatment patterns, follow-up duration, deaths, censoring and missingness. Categorical variables will be described using counts and percentages. Continuous variables will be described using means and standard deviations or medians and interquartile ranges, as appropriate. Summaries will be reported overall and by first-line mCRPC treatment category.
Outcome construction:
The primary endpoint will be overall survival from registry entry or first-line mCRPC treatment initiation to death from any cause. The final time origin will depend on available dates and documentation. Patients alive at last follow-up will be censored. Progression-free survival or time to progression will be analyzed only as secondary exploratory endpoints if consistently defined.
Landmark design:
Landmark times are anticipated to include 3, 6 and 12 months after the selected time origin. At each landmark, the risk set will include patients alive and under observation at that time. Predictors will be restricted to information available at or before the landmark. Subsequent survival will be predicted at clinically relevant horizons, such as 12, 24 and 36 months.
Prediction models:
Primary models will include baseline Cox proportional hazards models and landmark Cox models. Flexible parametric survival models will be used to allow non-linear baseline hazard patterns when appropriate. Bayesian parametric or flexible survival models will be explored as prespecified secondary analyses, subject to feasibility within the secure platform. Country or site effects will be considered when available, using stratification, fixed effects, frailty or hierarchical terms depending on sample size and event counts.
Predictor handling:
Candidate predictors will be selected based on clinical relevance and prior literature, not stepwise selection. Continuous variables will generally be retained as continuous variables, using transformations or restricted cubic splines for non-linear associations when appropriate. Sparse variables or variables with excessive missingness may be excluded from primary models and considered in sensitivity analyses. Missing covariate values will be handled using multiple imputation or sensitivity approaches depending on missingness patterns and platform feasibility.
Model evaluation:
Performance will be assessed using calibration plots, calibration slope, observed-versus-predicted survival by risk groups, time-dependent Brier score, integrated Brier score, Harrell's C-index or time-dependent concordance/AUC, and prediction error comparing baseline-only and landmark models. Internal validation will use bootstrap resampling or cross-validation, depending on computational feasibility.
Subgroup and sensitivity analyses:
Predefined subgroup analyses will examine performance by performance status, metastatic burden, first-line treatment category, baseline PSA/laboratory risk and country or region if available. Sensitivity analyses will evaluate alternative time origins, alternative landmark times, exclusion of patients with substantial missing early follow-up information, and alternative progression-related endpoint definitions if used.
Interpretation and compliance:
The study will not estimate causal treatment effects or compare treatments as if randomized. Treatment information will be used only for prediction, risk adjustment and descriptive characterization. This is a scientific, non-commercial project. Data will not be used for litigation, commercial product development, marketing, regulatory submissions or purposes outside the approved aims. Participant-level analyses will be performed within the secure platform. Participant-level data will not be downloaded, copied, redistributed, posted publicly or shared with unapproved individuals. Any unexpected or serious safety finding will be reported promptly to the Data Partner through the appropriate YODA Project process.
Narrative Summary: This study will use a completed real-world registry of men with metastatic castration-resistant prostate cancer to assess whether survival predictions improve when early follow-up information is added to baseline clinical data. We will develop and evaluate landmark models that update predicted overall survival after the start of first-line treatment using routinely collected information such as treatment course, PSA, symptoms or laboratory results when available. The project is scientific and non-commercial. It aims to improve understanding of dynamic prognosis in routine oncology care, where patients are diverse and researchers often need to interpret incomplete follow-up data.
Project Timeline:
Anticipated project start: within 1 month of data access approval and completion of the Data Use Agreement.
Months 1-2: Review data documentation within the secure platform, define the analytic cohort, construct outcomes, describe missingness and finalize analysis code.
Months 3-4: Conduct descriptive analyses, develop baseline prediction models and define landmark datasets.
Months 5-7: Fit landmark prediction models, conduct internal validation and evaluate model performance.
Months 8-9: Conduct subgroup and sensitivity analyses and prepare tables and figures.
Months 10-11: Draft the manuscript and prepare any scientific meeting abstract if appropriate.
Month 12: Submit the manuscript for peer-reviewed publication and report results back to the YODA Project in accordance with the DUA.
If the project is ongoing at the end of the access period, a renewal request will be submitted before DUA expiration. If the project is completed or not renewed, all data access and disposition requirements will be followed, including certification of destruction of downloaded supporting documentation where applicable.
Dissemination Plan:
The main product will be a peer-reviewed scientific manuscript describing dynamic overall survival prediction in a real-world mCRPC registry. The target audience includes oncology researchers, biostatisticians, clinical prediction researchers, real-world evidence investigators and clinicians interested in prognosis in advanced prostate cancer.
Potential target journals include BMC Medical Research Methodology, Statistics in Medicine, Pharmacoepidemiology and Drug Safety, Clinical Genitourinary Cancer, Cancer Medicine, European Urology Open Science, JCO Clinical Cancer Informatics, and Value in Health, depending on the final emphasis of the results.
Findings may also be submitted to relevant scientific meetings, such as ASCO Annual Meeting, ESMO Congress, ISPOR, Society for Clinical Trials Annual Meeting, International Society for Clinical Biostatistics, ENAR Spring Meeting, or Joint Statistical Meetings.
Findings will first be disseminated through peer-reviewed biomedical publication and/or presentation at a scientific meeting, in accordance with YODA Project and Data Partner requirements. The YODA Project will be provided with a copy of any manuscript or scientific meeting submission according to the applicable DUA requirements.
All publications and presentations will include the required acknowledgement that the analyses were based on data made available through the Yale University Open Data Access Project. No participant-level data will be shared, posted, redistributed or made publicly available.
Bibliography:
1. Chowdhury S, Bjartell A, Lumen N, et al. Real-World Outcomes in First-Line Treatment of Metastatic Castration-Resistant Prostate Cancer: The Prostate Cancer Registry. Target Oncol. 2020;15(3):301-315. doi:10.1007/s11523-020-00720-2.
2. van Houwelingen HC. Dynamic prediction by landmarking in event history analysis. Scand J Stat. 2007;34(1):70-85.
3. Putter H, van Houwelingen HC. Understanding landmarking and its relation with time-dependent Cox regression. Stat Biosci. 2017;9:489-503.
4. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Springer; 2019.
5. Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censored survival data. Stat Med. 2002;21(15):2175-2197.
6. Blanche P, Dartigues JF, Jacqmin-Gadda H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat Med. 2013;32(30):5381-5397.
7. Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999;18(17-18):2529-2545.
8. Moons KGM, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis, TRIPOD: Explanation and Elaboration. Ann Intern Med. 2015;162(1):W1-W73.
