Skip to main content


Research Proposal

Project Title: 
Unbiased Treatment Efficacy Detection Methods with Patient Centered Outcomes
Scientific Abstract: 

Background: Patient Centered Outcomes (PCO) are often incorporated in trials via patient/clinician/caregiver reported questionnaires. In the survival analysis context, the data is typically considered to be missing not at random (MNAR), which requires the use of statistical approaches that properly adjust for this type of missing data.
Objective: This research presents an item response theory (IRT) framework for adjusting PCO-based scores for MNAR drop-out. The model allows the IRT scores and the drop-out mechanism to be modeled simultaneously. This restores conditional independence and corrects for bias in the estimates of treatment efficacy that occur under MNAR drop-out.
Study Design: A simulation study was designed, run, and analyzed to illustrate the improved estimates of treatment efficacy using this approach. An empirical example using clinical trial data demonstrates the utility of the procedure.
Participants: Patients from a clinical trial with a survival analysis design will be included in the analysis.
Main Outcome Measures: In the simulation study, the bias and root mean squared error (RMSE) of the estimated separation in the treatment arms will be computed. In the empirical data example, the treatment arm separation will be estimated with the proposed procedure and without adjustment. Differences in the estimated separation and resulting inferences will be reported.
Statistical Analysis: This will compare the estimates of treatment efficacy from the proposed longitudinal IRT model with the estimates of treatment efficacy from a standard IRT model.

Brief Project Background and Statement of Project Significance: 

Statistical models in the psychometrics literature have been slow to penetrate the field of Patient Centered Outcomes (PCO). Latent variable models, of which IRT is a sub-category, offer advantages in modeling data with measurement error.1 Although there has been interest in applying the IRT framework to PCO data, there have been barriers to widespread adoption. Specifically, there is a lack of easily implemented procedures for dealing with MNAR data. Little methodological work has been done to address this issue since the initial research on the topic.2 Until these methods are developed and validated, IRT-based estimates of treatment efficacy using PCO will be biased under MNAR data. This work addresses a gap in the statistical literature by developing a psychometric model that can accommodate MNAR data. Adapting modern psychometric methods for the field of PCO is a crucial step in addressing the lack of sensitivity in many PCO measures.3

Specific Aims of the Project: 

There are 4 specific aims of the project: 1. Theoretical statistical exposition of methods under development. 2. Simulation studies of proposed methods. 3. Evaluation of proposed methods using randomized clinical trial data. 4. Publication of proposed methods, findings, and software from simulation and empirical studies.

Aim 1: See attached manuscript for outline of the theoretical exposition (equations are in LaTeX and won’t render here).

Aim 2: Simulation studies demonstrate the improved estimates of treatment efficacy compared to other methods. This work hypothesizes that the proposed method will return estimates with smaller bias and RMSE than standard approaches. This will lead to greater power to detect treatment arm separation.

Aim 3: After showing that the procedure can recover the true parameters in a simulation study, the utility of the proposed method will be demonstrated using data from a clinical trial. The clinical trial should be a survival analysis design, which makes it highly likely to feature MNAR drop-out mechanisms. Applying the proposed method should then yield different estimates than an unadjusted IRT model.

Aim 4: This work will be published in a statistics journal. Additionally, all software code written will be made available online.

What is the purpose of the analysis being proposed? Please select all that apply.: 
Develop or refine statistical methods
Software Used: 
Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: 

An appropriate data source requires (1) missing not at random as part of the design, (2) a substantial proportion of drop-out, and (3) sufficient sample size to estimate the models (note that the two Abiraterone trials or the two Daratumumab trials could be combined to increase sample size).

The requested studies meet these criteria. However, a useful dataset to illustrate the proposed method would also show some difference in the treatment effect across methods. This would highlight the potential benefit of the method. Furthermore, because the method is new, it is unclear what the limitations of the procedure are. Having several trials to evaluate the method would be the most productive way of understanding the benefits and limitations of the proposed procedure. Crucially, this could be included in the manuscript to offer important guidance for researchers planning to use the method.

The following questionnaires will be used:
Paliperidone palmitate: PANSS, PSP, SDS
Abiraterone: FACT-P, BFI
Galantamine: Mini-Mental State Examination, Disability Assessment in Dementia
Daratumumab: EORTC-QLQ-C30 and EQ-5D-5L

Main Outcome Measure and how it will be categorized/defined for your study: 

The purpose of this study is to evaluate the proposed statistical methodology that is being developed. This is done via simulation study, where procedures can be evaluated in terms of the quality of the estimates. The quality of the estimates can be directly compared because the true values are known to the researcher. The estimates are evaluated via two measures, bias and RMSE. These are standard measures in the field.

For an illustration of the statistical methods using empirical clinical trial data (i.e., data shared by YODA), the patient quality of life score will be estimated for each treatment arm, using the proposed statistical procedure and the existing procedure. The resulting estimates will be compared, with the percent difference reported. Importantly, any differences in inferences will also be reported.

Main Predictor/Independent Variable and how it will be categorized/defined for your study: 

IRT models use questionnaire items to evaluate the construct of interest. The proposed approach goes one step further and also incorporates patient drop-out. That is, the patient data is re-coded to reflect the timepoint at which they dropped out.

Note here that the proposed method is the reverse of the typical survival analysis. For example, in oncology trials, the typical approach compares time to drop-out across treatment arms, with an adjustment for patient health-related quality of life. The work here does the opposite: the longitudinal IRT model compares quality of life across treatment arms, with an adjustment for drop-out. This assumes that the drop-out is related to the patient quality of life – that is, it assumes the data is MNAR due to the survival analysis design. The ultimate purpose of the proposed method is to compare patient health-related quality of life across the treatment arms, without the bias that occurs from ignoring the MNAR drop-out mechanism.

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: 

Standard practice is to include basic demographic information, such as age and sex, in the statistical model. This ensures that potential confounds have been controlled for, which allows for valid interpretation of the model output.

Statistical Analysis Plan: 

The proposed use of this dataset is to illustrate a newly developed statistical method. This illustration will be part of a manuscript that provides a technical exposition of the methods, as well as a simulation study to show the performance of the method. A minimal amount of descriptive statistics will be computed, with the main focus being a comparison of the estimated treatment effect using the proposed method versus standard methods. To accommodate this analysis, descriptives such as the proportion of drop-out at each time point, as well as the average score at each timepoint (stratified across treatment arms) will be computed. This will help to show how the method makes adjustments for missing data and how that impacts the model-based estimates of treatment efficacy.

Narrative Summary: 

Patient Centered Outcomes (PCO) are often incorporated in trials via patient/clinician/caregiver reported questionnaires. In the survival analysis context, such as cancer trials, patients drop-out of the study for reasons related to disease severity. This type of drop-out leads to incorrect conclusions about patient quality of life. This research develops statistical approaches that are needed that appropriately adjust for this missingness and yield correct inferences. More robust development of these methods will help detect which interventions have a positive impact on patient quality of life. This, in turn, will help guide patient centered drug development.

Project Timeline: 

A draft manuscript has been attached as a file below. A technical exposition has been sketched out, and initial simulation evidence already compiled. The application to the clinical trial data will help guide the simulation study conditions. The project will start once the data have been transferred.
Data management: 1 month
Application to clinical trial, creating tables/figures, writing up results: 3 months
Full simulation study, creating tables/figures, writing up results: 3 months
Completing writing, revisions: 3 months
Submission to journal: 10 months from data transfer

Note: typically journal reviewers will ask for additional work to be done. Given the 6-18 month review times for popular statistical journals, an extension will almost certainly be necessary.

Dissemination Plan: 

Please see attached file for a draft of a manuscript. This is being prepared for the journal Statistics in Medicine. The goal is to highlight psychometric methods that can and should be utilized in medical applications. Just as importantly, software code will be disseminated. The code will be included as an Appendix to the manuscript, available online. The same code will also be posted to a github account, making it easily searchable. Please note that this software code will NOT include any sensitive information regarding the requested dataset.


1. de Ayala RJ. The Theory and Practice of Item Response Theory (Methodology in the Social Sciences). New York: The Guilford Press; 2009.
2. Douglas JA. Item response models for longitudinal quality of life data in clinical trials. Stat Med. Nov 15 1999;18(21):2917-2931.
3. Lawrence Gould A, Boye ME, Crowther MJ, et al. Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group. Statistics in Medicine. 2015;34(14):2181-2195.

Supplementary Material: 

General Information

How did you learn about the YODA Project?: 

Data Request Status

Change the status of this request: