2024-0056 - The YODA Project

                    array(39) {
  ["project_status"]=>
  string(7) "ongoing"
  ["project_assoc_trials"]=>
  array(5) {
    [0]=>
    object(WP_Post)#4779 (24) {
      ["ID"]=>
      int(1166)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2014-09-22 14:26:00"
      ["post_date_gmt"]=>
      string(19) "2014-09-22 14:26:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(216) "NCT00334126 - A Randomized, Double-blind, Placebo-controlled, Parallel Group Study to Evaluate the Efficacy and Safety of Paliperidone ER Compared to Quetiapine in Subjects With an Acute Exacerbation of Schizophrenia"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(194) "nct00334126-a-randomized-double-blind-placebo-controlled-parallel-group-study-to-evaluate-the-efficacy-and-safety-of-paliperidone-er-compared-to-quetiapine-in-subjects-with-an-acute-exacerbation"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2023-02-06 13:13:19"
      ["post_modified_gmt"]=>
      string(19) "2023-02-06 13:13:19"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(243) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00334126-a-randomized-double-blind-placebo-controlled-parallel-group-study-to-evaluate-the-efficacy-and-safety-of-paliperidone-er-compared-to-quetiapine-in-subjects-with-an-acute-exacerbation/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [1]=>
    object(WP_Post)#4776 (24) {
      ["ID"]=>
      int(1434)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2016-01-22 08:48:00"
      ["post_date_gmt"]=>
      string(19) "2016-01-22 08:48:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(268) "NCT00085748 - A Randomized, 6-Week Double-Blind, Placebo-Controlled Study With an Optional 24-Week Open-Label Extension to Evaluate the Safety and Tolerability of Flexible Doses of Paliperidone Extended Release in the Treatment of Geriatric Patients With Schizophrenia"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(196) "nct00085748-a-randomized-6-week-double-blind-placebo-controlled-study-with-an-optional-24-week-open-label-extension-to-evaluate-the-safety-and-tolerability-of-flexible-doses-of-paliperidone-extend"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2024-03-26 11:56:36"
      ["post_modified_gmt"]=>
      string(19) "2024-03-26 15:56:36"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(245) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00085748-a-randomized-6-week-double-blind-placebo-controlled-study-with-an-optional-24-week-open-label-extension-to-evaluate-the-safety-and-tolerability-of-flexible-doses-of-paliperidone-extend/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [2]=>
    object(WP_Post)#4777 (24) {
      ["ID"]=>
      int(1422)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2015-10-27 09:50:00"
      ["post_date_gmt"]=>
      string(19) "2015-10-27 09:50:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(159) "NCT00078039 - Trial Evaluating Three Fixed Dosages of Paliperidone Extended-Release (ER) Tablets and Olanzapine in the Treatment of Patients With Schizophrenia"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(155) "nct00078039-trial-evaluating-three-fixed-dosages-of-paliperidone-extended-release-er-tablets-and-olanzapine-in-the-treatment-of-patients-with-schizophrenia"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2023-02-06 13:19:43"
      ["post_modified_gmt"]=>
      string(19) "2023-02-06 13:19:43"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(204) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00078039-trial-evaluating-three-fixed-dosages-of-paliperidone-extended-release-er-tablets-and-olanzapine-in-the-treatment-of-patients-with-schizophrenia/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [3]=>
    object(WP_Post)#4778 (24) {
      ["ID"]=>
      int(1395)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2015-10-20 11:15:00"
      ["post_date_gmt"]=>
      string(19) "2015-10-20 11:15:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(301) "NCT00083668 - A Randomized, Double-blind, Placebo- and Active-controlled, Parallel-group, Dose-response Study to Evaluate the Efficacy and Safety of 3 Fixed Dosages of Paliperidone Extended Release (ER) Tablets and Olanzapine, With Open-label Extension, in the Treatment of Patients With Schizophrenia"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(192) "nct00083668-a-randomized-double-blind-placebo-and-active-controlled-parallel-group-dose-response-study-to-evaluate-the-efficacy-and-safety-of-3-fixed-dosages-of-paliperidone-extended-release-e"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2023-02-06 13:18:57"
      ["post_modified_gmt"]=>
      string(19) "2023-02-06 13:18:57"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(241) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00083668-a-randomized-double-blind-placebo-and-active-controlled-parallel-group-dose-response-study-to-evaluate-the-efficacy-and-safety-of-3-fixed-dosages-of-paliperidone-extended-release-e/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [4]=>
    object(WP_Post)#4780 (24) {
      ["ID"]=>
      int(1162)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2014-09-22 12:29:00"
      ["post_date_gmt"]=>
      string(19) "2014-09-22 12:29:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(262) "NCT00518323 - A Randomized, Multicenter, Double-Blind, Weight-Based, Fixed-Dose, Parallel-Group, Placebo-Controlled Study of the Efficacy and Safety of Extended Release Paliperidone for the Treatment of Schizophrenia in Adolescent Subjects, 12 to 17 Years of Age"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(191) "nct00518323-a-randomized-multicenter-double-blind-weight-based-fixed-dose-parallel-group-placebo-controlled-study-of-the-efficacy-and-safety-of-extended-release-paliperidone-for-the-treatment"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2024-03-26 11:54:17"
      ["post_modified_gmt"]=>
      string(19) "2024-03-26 15:54:17"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(240) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00518323-a-randomized-multicenter-double-blind-weight-based-fixed-dose-parallel-group-placebo-controlled-study-of-the-efficacy-and-safety-of-extended-release-paliperidone-for-the-treatment/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
  }
  ["project_title"]=>
  string(68) "Multi-study causal inference for robust clinical outcome predictions"
  ["project_narrative_summary"]=>
  string(1215) "Previous study by Chekroud et al. (2024) that used the data sets from YODA (NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668) to predict treatment outcomes in schizophrenia found that even though the machine learning models may have high prediction accuracy on the data set where the model was trained, the out-of-study prediction accuracy decreased drastically, suggesting limited generalizability of the machine learning models. One potential reason for the worse out-of-study prediction accuracy is due to the population heterogeneity, and therefore the machine learning model only learned context-dependent relationship that is sensitive to the shift in study population.  



One solution for robust clinical outcome prediction is through causal inference, where previous studies have found that causal relationships are less sensitive to the shift in data distribution across studies, and thus the causal models can lead to more robust predictions on external data sets. In particular, we would like to apply the multi-study R learner for heterogeneous treatment effect estimation that is robust to between-study heterogeneity and achieve more accurate out-of-study predictions."
  ["project_learn_source"]=>
  string(12) "scien_public"
  ["principal_investigator"]=>
  array(7) {
    ["first_name"]=>
    string(8) "Giovanni"
    ["last_name"]=>
    string(10) "Parmigiani"
    ["degree"]=>
    string(3) "PhD"
    ["primary_affiliation"]=>
    string(51) "Dana Farber Cancer Institute and Harvard University"
    ["email"]=>
    string(21) "yujiewu@g.harvard.edu"
    ["state_or_province"]=>
    string(13) "Massachusetts"
    ["country"]=>
    string(13) "United States"
  }
  ["project_key_personnel"]=>
  array(2) {
    [0]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(4) "Boyu"
      ["p_pers_l_name"]=>
      string(3) "Ren"
      ["p_pers_degree"]=>
      string(3) "PhD"
      ["p_pers_pr_affil"]=>
      string(15) "McLean Hospital"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(2) "no"
    }
    [1]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(8) "Yujie Wu"
      ["p_pers_l_name"]=>
      string(2) "Wu"
      ["p_pers_degree"]=>
      string(8) "Master's"
      ["p_pers_pr_affil"]=>
      string(18) "Harvard University"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(3) "yes"
    }
  }
  ["project_ext_grants"]=>
  array(2) {
    ["value"]=>
    string(2) "no"
    ["label"]=>
    string(68) "No external grants or funds are being used to support this research."
  }
  ["project_date_type"]=>
  string(18) "full_crs_supp_docs"
  ["property_scientific_abstract"]=>
  string(3572) "Background:

Applying machine learning models in healthcare has become a heated topic, particularly for making clinical predictions. Precise outcome prediction has important clinical benefits as it directly impacts treatment decision-making and benefits patients' health. For predicting models, generalizability is defined as the machine learning model's ability to make accurate predictions on a new/external/independent data sets [3]. If a model does not have the generalizability property, then it would be hard to reproduce results and make predictions for general use. 

A recent study has made use of the five trials from YODA: NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668, to make predictions for treatment outcomes in schizophrenia. The primary outcome is the Remission in Schizophrenia Working Group criteria (RSWG) and predictors includes all information available at baseline across all trials such as age, race [1]. Their results showed that the machine learning models failed to generalize to unseen, external trials that were not included when training the prediction models. One possible explanation is the study heterogeneity between different clinical trials where the machine learning models were trained to only extract study-specific information and thus having difficulty to generalize to a new data set, where the data distribution of the new study population might be different.



Objective:

Our project will try to address this issue through causal predictions, where causal relationships between predictors and outcomes are often stable between different studies, and therefore have greater potential to generalize well to an unseen study, making accurate treatment outcome predictions to facilitate clinical treatment decision-making.



Study Design:

We will combine the multiple trials together and perform meta-analysis.



Participants:

Yujie Wu, Ph.D. student in Biostatistics, Department of Biostatistics, Harvard University

Boyu Ren, Ph.D.,  Instructor in Psychiatry, McLean Hospital, Harvard Medical School

Giovanni Parmigiani, Ph.D., Professor of Biostatistics, Department of Data Science, Dana Farber Cancer Institute



Primary and Secondary Outcome Measure(s):

Following Chekroud et al. (2024), we will use the symptomatic outcomes based on the the Positive and Negative Syndrome Scale (PANSS), which is used to measure treatment outcomes. The primary outcome of interest is the Remission in Schizophrenia Working Group criteria (RSWG), which is a transformation of PANSS. 

The secondary outcomes include: 25% symptom reduction (binary), 50% symptom reduction (binary) and Baseline-adjusted percent change in symptoms (continuous).



Statistical Analysis:

We will first impute the missing data for each predictor variable by the median values, and run the descriptive analysis on the baseline covariates. Mean will be reported for continuous variables and percentage will be reported for categorical variables. Standard deviation will be reported as a measure of variability of the variables. 

We then will replicate the results presented in Chekroud et al. (2024) by using the elastic net algorithm and the tuning parameters will be selected through the 10-fold cross-validation. 

We will apply the multi-study R-learner on the trials to estimate the robust causal relationships and make comparisons on the prediction accuracy with the elastic net algorithm.

"
  ["project_brief_bg"]=>
  string(1785) "Applying machine learning models in healthcare has become a heated topic, particularly for making clinical predictions. Precise outcome prediction has important clinical benefits as it directly impacts treatment decision-making and benefits patients' health. For predicting models, generalizability is defined as the machine learning model's ability to make accurate predictions on a new/external/independent data sets [3]. If a model does not have the generalizability property, then it would be hard to reproduce results and make predictions for general use.



A recent study has made use of the five trials from YODA: NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668, to make predictions for treatment outcomes in schizophrenia. The primary outcome is the Remission in Schizophrenia Working Group criteria (RSWG) and predictors includes all information available at baseline across all trials such as age, race [1]. Their results showed that the machine learning models failed to generalize to unseen, external trials that were not included when training the prediction models. One possible explanation is the study heterogeneity between different clinical trials where the machine learning models were trained to only extract study-specific information and thus having difficulty to generalize to a new data set, where the data distribution of the new study population might be different.



Our project will try to address this issue through causal predictions, where causal relationships between predictors and outcomes are often stable between different studies, and therefore have greater potential to generalize well to an unseen study, making accurate treatment outcome predictions to facilitate clinical treatment decision-making."
  ["project_specific_aims"]=>
  string(332) "In this project, we apply the multi-study R learner for heterogeneous treatment effect estimation that is robust to between-study heterogeneity [2] to make predictions for treatment outcomes in schizophrenia. We will compare prediction accuracy from our model with traditional machine learning models used by Chekroud et al. (2024)."
  ["project_study_design"]=>
  array(2) {
    ["value"]=>
    string(7) "meta_an"
    ["label"]=>
    string(52) "Meta-analysis (analysis of multiple trials together)"
  }
  ["project_purposes"]=>
  array(4) {
    [0]=>
    array(2) {
      ["value"]=>
      string(22) "participant_level_data"
      ["label"]=>
      string(36) "Participant-level data meta-analysis"
    }
    [1]=>
    array(2) {
      ["value"]=>
      string(37) "participant_level_data_only_from_yoda"
      ["label"]=>
      string(51) "Meta-analysis using only data from the YODA Project"
    }
    [2]=>
    array(2) {
      ["value"]=>
      string(37) "develop_or_refine_statistical_methods"
      ["label"]=>
      string(37) "Develop or refine statistical methods"
    }
    [3]=>
    array(2) {
      ["value"]=>
      string(50) "research_on_clinical_prediction_or_risk_prediction"
      ["label"]=>
      string(50) "Research on clinical prediction or risk prediction"
    }
  }
  ["project_software_used"]=>
  array(2) {
    ["value"]=>
    string(7) "rstudio"
    ["label"]=>
    string(7) "RStudio"
  }
  ["project_research_methods"]=>
  string(334) "The data Source and inclusion/exclusion criteria follow the paper by Chekroud et al. (2024).  We will use treatment data from five international, multisite RCTs (NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668). We will exclude participants that did not have a follow-up at the 4-week point after study enrollment."
  ["project_main_outcome_measure"]=>
  string(484) "Following Chekroud et al. (2024), we will use the symptomatic outcomes based on the the Positive and Negative Syndrome Scale (PANSS), which is used to measure treatment outcomes. The primary outcome of interest is the Remission in Schizophrenia Working Group criteria (RSWG), which is a transformation of PANSS.



The secondary outcomes include: 25% symptom reduction (binary), 50% symptom reduction (binary) and Baseline-adjusted percent change in symptoms (continuous)."
  ["project_main_predictor_indep"]=>
  string(149) "The main predictor variable is the treatment randomization indicating whether the patients were randomized to an antipsychotic medication or placebo."
  ["project_other_variables_interest"]=>
  string(374) "The other predictors in the model include basic demographic features, psychiatric history (DSM-IV diagnosis category, age of diagnosis, psychiatric hospitalizations), clinical data (PANSS, Clinical Global Impression), extrapyramidal symptom scales (Abnormal Involuntary Movement Scale) and Simpson Angus Scale, biometric data (blood chemistry panel, hematology, urinalysis)."
  ["project_stat_analysis_plan"]=>
  string(731) "We will first impute the missing data for each predictor variable by the median values, and run the descriptive analysis on the baseline covariates. Mean will be reported for continuous variables and percentage will be reported for categorical variables. Standard deviation will be reported as a measure of variability of the variables.



We then will replicate the results presented in Chekroud et al. (2024) by using the elastic net algorithm and the tuning parameters will be selected through the 10-fold cross-validation.



We will apply the multi-study R-learner on the trials to estimate the robust causal relationships and make comparisons on the prediction accuracy with the elastic net algorithm."
  ["project_timeline"]=>
  string(297) "We will begin analyzing the data upon approval of the data sets. The expected analysis completion date will be the end of July 2024, and the manuscript will be drafted and submitted in August 2024. The date when the results are reported back to the YODA Project is expected to be by December 2024."
  ["project_dissemination_plan"]=>
  string(376) "We will develop an R package for public use. We will target our project to researchers who work on applying machine learning models in health care, particularly for those with multiple data sets, informing researchers the importance of generalizability of machine learning models. Suitable journals for submission include Biostatistics, Statistics in Medicine, Bioinformatics."
  ["project_bibliography"]=>
  string(629) "[1] Chekroud, A. M., Hawrilenko, M., Loho, H., Bondar, J., Gueorguieva, R., Hasan, A., … & Paulus, M. (2024). Illusory generalizability of clinical prediction models. Science, 383(6679), 164-167.
[2] Shyr, C., Ren, B., Patil, P., & Parmigiani, G. (2023). Multi-study R-learner for Heterogeneous Treatment Effect Estimation. arXiv preprint arXiv:2306.01086.
[3] Yang, J., Soltan, A. A., & Clifton, D. A. (2022). Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. npj Digital Medicine, 5(1), 69.
"
  ["project_suppl_material"]=>
  bool(false)
  ["project_coi"]=>
  array(4) {
    [0]=>
    array(1) {
      ["file_coi"]=>
      bool(false)
    }
    [1]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(14064)
        ["id"]=>
        int(14064)
        ["title"]=>
        string(11) "COI_FORM_BR"
        ["filename"]=>
        string(15) "COI_FORM_BR.pdf"
        ["filesize"]=>
        int(20092)
        ["url"]=>
        string(64) "https://yoda.yale.edu/wp-content/uploads/2024/01/COI_FORM_BR.pdf"
        ["link"]=>
        string(57) "https://yoda.yale.edu/data-request/2024-0056/coi_form_br/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1638"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(11) "coi_form_br"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(13979)
        ["date"]=>
        string(19) "2024-01-29 16:26:08"
        ["modified"]=>
        string(19) "2024-01-29 16:26:08"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [2]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(14065)
        ["id"]=>
        int(14065)
        ["title"]=>
        string(11) "COI_FORM_GP"
        ["filename"]=>
        string(15) "COI_FORM_GP.pdf"
        ["filesize"]=>
        int(20694)
        ["url"]=>
        string(64) "https://yoda.yale.edu/wp-content/uploads/2024/01/COI_FORM_GP.pdf"
        ["link"]=>
        string(57) "https://yoda.yale.edu/data-request/2024-0056/coi_form_gp/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1638"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(11) "coi_form_gp"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(13979)
        ["date"]=>
        string(19) "2024-01-29 16:26:24"
        ["modified"]=>
        string(19) "2024-01-29 16:26:24"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [3]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(14132)
        ["id"]=>
        int(14132)
        ["title"]=>
        string(11) "COI_FORM_YW"
        ["filename"]=>
        string(15) "COI_FORM_YW.pdf"
        ["filesize"]=>
        int(20085)
        ["url"]=>
        string(64) "https://yoda.yale.edu/wp-content/uploads/2024/01/COI_FORM_YW.pdf"
        ["link"]=>
        string(59) "https://yoda.yale.edu/data-request/2024-0056/coi_form_yw-2/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1638"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(13) "coi_form_yw-2"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(13979)
        ["date"]=>
        string(19) "2024-02-07 00:44:05"
        ["modified"]=>
        string(19) "2024-02-07 00:44:05"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
  }
  ["data_use_agreement_training"]=>
  bool(true)
  ["certification"]=>
  bool(true)
  ["search_order"]=>
  string(1) "0"
  ["project_send_email_updates"]=>
  bool(false)
  ["project_publ_available"]=>
  bool(true)
  ["project_year_access"]=>
  string(4) "2024"
  ["project_rep_publ"]=>
  bool(false)
  ["project_assoc_data"]=>
  array(0) {
  }
  ["project_due_dil_assessment"]=>
  array(21) {
    ["ID"]=>
    int(14495)
    ["id"]=>
    int(14495)
    ["title"]=>
    string(47) "YODA Project Due Diligence Assessment 2024-0056"
    ["filename"]=>
    string(51) "YODA-Project-Due-Diligence-Assessment-2024-0056.pdf"
    ["filesize"]=>
    int(114616)
    ["url"]=>
    string(100) "https://yoda.yale.edu/wp-content/uploads/2024/01/YODA-Project-Due-Diligence-Assessment-2024-0056.pdf"
    ["link"]=>
    string(93) "https://yoda.yale.edu/data-request/2024-0056/yoda-project-due-diligence-assessment-2024-0056/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(3) "190"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(47) "yoda-project-due-diligence-assessment-2024-0056"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(13979)
    ["date"]=>
    string(19) "2024-04-03 15:43:00"
    ["modified"]=>
    string(19) "2024-04-03 15:43:00"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_title_link"]=>
  array(21) {
    ["ID"]=>
    int(14496)
    ["id"]=>
    int(14496)
    ["title"]=>
    string(40) "YODA Project Protocol-2024-0056-24-02-06"
    ["filename"]=>
    string(44) "YODA-Project-Protocol-2024-0056-24-02-06.pdf"
    ["filesize"]=>
    int(176530)
    ["url"]=>
    string(93) "https://yoda.yale.edu/wp-content/uploads/2024/01/YODA-Project-Protocol-2024-0056-24-02-06.pdf"
    ["link"]=>
    string(86) "https://yoda.yale.edu/data-request/2024-0056/yoda-project-protocol-2024-0056-24-02-06/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(3) "190"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(40) "yoda-project-protocol-2024-0056-24-02-06"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(13979)
    ["date"]=>
    string(19) "2024-04-03 15:43:13"
    ["modified"]=>
    string(19) "2024-04-03 15:43:13"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_review_link"]=>
  array(21) {
    ["ID"]=>
    int(14497)
    ["id"]=>
    int(14497)
    ["title"]=>
    string(36) "YODA Project Review - 2024-0056_site"
    ["filename"]=>
    string(38) "YODA-Project-Review-2024-0056_site.pdf"
    ["filesize"]=>
    int(893134)
    ["url"]=>
    string(87) "https://yoda.yale.edu/wp-content/uploads/2024/01/YODA-Project-Review-2024-0056_site.pdf"
    ["link"]=>
    string(80) "https://yoda.yale.edu/data-request/2024-0056/yoda-project-review-2024-0056_site/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(3) "190"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(34) "yoda-project-review-2024-0056_site"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(13979)
    ["date"]=>
    string(19) "2024-04-03 15:43:26"
    ["modified"]=>
    string(19) "2024-04-03 15:43:26"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_highlight_button"]=>
  string(0) ""
  ["request_data_partner"]=>
  string(15) "johnson-johnson"
  ["request_overridden_res"]=>
  string(1) "3"
}
data partner
array(1) {
  [0]=>
  string(15) "johnson-johnson"
}


pi country
array(0) {
}


pi affil
array(0) {
}


products
array(1) {
  [0]=>
  string(6) "invega"
}


num of trials
array(1) {
  [0]=>
  string(1) "5"
}


res
array(1) {
  [0]=>
  string(1) "3"
}

General Information

How did you learn about the YODA Project?: Scientific Publication

Conflict of Interest

Request Clinical Trials

Associated Trial(s):

What type of data are you looking for?: Individual Participant-Level Data, which includes Full CSR and all supporting documentation

Request Clinical Trials

Data Request Status

Status: Ongoing

Research Proposal

Project Title: Multi-study causal inference for robust clinical outcome predictions

Scientific Abstract: Background:
Applying machine learning models in healthcare has become a heated topic, particularly for making clinical predictions. Precise outcome prediction has important clinical benefits as it directly impacts treatment decision-making and benefits patients' health. For predicting models, generalizability is defined as the machine learning model's ability to make accurate predictions on a new/external/independent data sets [3]. If a model does not have the generalizability property, then it would be hard to reproduce results and make predictions for general use.
A recent study has made use of the five trials from YODA: NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668, to make predictions for treatment outcomes in schizophrenia. The primary outcome is the Remission in Schizophrenia Working Group criteria (RSWG) and predictors includes all information available at baseline across all trials such as age, race [1]. Their results showed that the machine learning models failed to generalize to unseen, external trials that were not included when training the prediction models. One possible explanation is the study heterogeneity between different clinical trials where the machine learning models were trained to only extract study-specific information and thus having difficulty to generalize to a new data set, where the data distribution of the new study population might be different.

Objective:
Our project will try to address this issue through causal predictions, where causal relationships between predictors and outcomes are often stable between different studies, and therefore have greater potential to generalize well to an unseen study, making accurate treatment outcome predictions to facilitate clinical treatment decision-making.

Study Design:
We will combine the multiple trials together and perform meta-analysis.

Participants:
Yujie Wu, Ph.D. student in Biostatistics, Department of Biostatistics, Harvard University
Boyu Ren, Ph.D., Instructor in Psychiatry, McLean Hospital, Harvard Medical School
Giovanni Parmigiani, Ph.D., Professor of Biostatistics, Department of Data Science, Dana Farber Cancer Institute

Primary and Secondary Outcome Measure(s):
Following Chekroud et al. (2024), we will use the symptomatic outcomes based on the the Positive and Negative Syndrome Scale (PANSS), which is used to measure treatment outcomes. The primary outcome of interest is the Remission in Schizophrenia Working Group criteria (RSWG), which is a transformation of PANSS.
The secondary outcomes include: 25% symptom reduction (binary), 50% symptom reduction (binary) and Baseline-adjusted percent change in symptoms (continuous).

Statistical Analysis:
We will first impute the missing data for each predictor variable by the median values, and run the descriptive analysis on the baseline covariates. Mean will be reported for continuous variables and percentage will be reported for categorical variables. Standard deviation will be reported as a measure of variability of the variables.
We then will replicate the results presented in Chekroud et al. (2024) by using the elastic net algorithm and the tuning parameters will be selected through the 10-fold cross-validation.
We will apply the multi-study R-learner on the trials to estimate the robust causal relationships and make comparisons on the prediction accuracy with the elastic net algorithm.

Brief Project Background and Statement of Project Significance: Applying machine learning models in healthcare has become a heated topic, particularly for making clinical predictions. Precise outcome prediction has important clinical benefits as it directly impacts treatment decision-making and benefits patients' health. For predicting models, generalizability is defined as the machine learning model's ability to make accurate predictions on a new/external/independent data sets [3]. If a model does not have the generalizability property, then it would be hard to reproduce results and make predictions for general use.

A recent study has made use of the five trials from YODA: NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668, to make predictions for treatment outcomes in schizophrenia. The primary outcome is the Remission in Schizophrenia Working Group criteria (RSWG) and predictors includes all information available at baseline across all trials such as age, race [1]. Their results showed that the machine learning models failed to generalize to unseen, external trials that were not included when training the prediction models. One possible explanation is the study heterogeneity between different clinical trials where the machine learning models were trained to only extract study-specific information and thus having difficulty to generalize to a new data set, where the data distribution of the new study population might be different.

Our project will try to address this issue through causal predictions, where causal relationships between predictors and outcomes are often stable between different studies, and therefore have greater potential to generalize well to an unseen study, making accurate treatment outcome predictions to facilitate clinical treatment decision-making.

Specific Aims of the Project: In this project, we apply the multi-study R learner for heterogeneous treatment effect estimation that is robust to between-study heterogeneity [2] to make predictions for treatment outcomes in schizophrenia. We will compare prediction accuracy from our model with traditional machine learning models used by Chekroud et al. (2024).

Study Design: Meta-analysis (analysis of multiple trials together)

What is the purpose of the analysis being proposed? Please select all that apply.: Participant-level data meta-analysis Meta-analysis using only data from the YODA Project Develop or refine statistical methods Research on clinical prediction or risk prediction

Software Used: RStudio

Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: The data Source and inclusion/exclusion criteria follow the paper by Chekroud et al. (2024). We will use treatment data from five international, multisite RCTs (NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668). We will exclude participants that did not have a follow-up at the 4-week point after study enrollment.

Primary and Secondary Outcome Measure(s) and how they will be categorized/defined for your study: Following Chekroud et al. (2024), we will use the symptomatic outcomes based on the the Positive and Negative Syndrome Scale (PANSS), which is used to measure treatment outcomes. The primary outcome of interest is the Remission in Schizophrenia Working Group criteria (RSWG), which is a transformation of PANSS.

The secondary outcomes include: 25% symptom reduction (binary), 50% symptom reduction (binary) and Baseline-adjusted percent change in symptoms (continuous).

Main Predictor/Independent Variable and how it will be categorized/defined for your study: The main predictor variable is the treatment randomization indicating whether the patients were randomized to an antipsychotic medication or placebo.

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: The other predictors in the model include basic demographic features, psychiatric history (DSM-IV diagnosis category, age of diagnosis, psychiatric hospitalizations), clinical data (PANSS, Clinical Global Impression), extrapyramidal symptom scales (Abnormal Involuntary Movement Scale) and Simpson Angus Scale, biometric data (blood chemistry panel, hematology, urinalysis).

Statistical Analysis Plan: We will first impute the missing data for each predictor variable by the median values, and run the descriptive analysis on the baseline covariates. Mean will be reported for continuous variables and percentage will be reported for categorical variables. Standard deviation will be reported as a measure of variability of the variables.

We then will replicate the results presented in Chekroud et al. (2024) by using the elastic net algorithm and the tuning parameters will be selected through the 10-fold cross-validation.

We will apply the multi-study R-learner on the trials to estimate the robust causal relationships and make comparisons on the prediction accuracy with the elastic net algorithm.

Narrative Summary: Previous study by Chekroud et al. (2024) that used the data sets from YODA (NCT00518323, NCT00334126, NCT00085748, NCT00078039, and NCT00083668) to predict treatment outcomes in schizophrenia found that even though the machine learning models may have high prediction accuracy on the data set where the model was trained, the out-of-study prediction accuracy decreased drastically, suggesting limited generalizability of the machine learning models. One potential reason for the worse out-of-study prediction accuracy is due to the population heterogeneity, and therefore the machine learning model only learned context-dependent relationship that is sensitive to the shift in study population.

One solution for robust clinical outcome prediction is through causal inference, where previous studies have found that causal relationships are less sensitive to the shift in data distribution across studies, and thus the causal models can lead to more robust predictions on external data sets. In particular, we would like to apply the multi-study R learner for heterogeneous treatment effect estimation that is robust to between-study heterogeneity and achieve more accurate out-of-study predictions.

Project Timeline: We will begin analyzing the data upon approval of the data sets. The expected analysis completion date will be the end of July 2024, and the manuscript will be drafted and submitted in August 2024. The date when the results are reported back to the YODA Project is expected to be by December 2024.

Dissemination Plan: We will develop an R package for public use. We will target our project to researchers who work on applying machine learning models in health care, particularly for those with multiple data sets, informing researchers the importance of generalizability of machine learning models. Suitable journals for submission include Biostatistics, Statistics in Medicine, Bioinformatics.

Bibliography:

[1] Chekroud, A. M., Hawrilenko, M., Loho, H., Bondar, J., Gueorguieva, R., Hasan, A., … & Paulus, M. (2024). Illusory generalizability of clinical prediction models. Science, 383(6679), 164-167.

[2] Shyr, C., Ren, B., Patil, P., & Parmigiani, G. (2023). Multi-study R-learner for Heterogeneous Treatment Effect Estimation. arXiv preprint arXiv:2306.01086.

[3] Yang, J., Soltan, A. A., & Clifton, D. A. (2022). Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. npj Digital Medicine, 5(1), 69.