2020-4318 - The YODA Project

                    array(45) {
  ["project_title"]=>
  string(122) "Enhancing inference from real-world data using externally-derived missing data models: a pilot study of Ulcerative Colitis"
  ["project_narrative_summary"]=>
  string(619) "Real-world evidence is an emerging research area that proposes to use data from non-experimental settings, such as routine clinical care, to guide better decision making. This field has received growing interest in recent years for a variety of reasons, including the realization that randomized controlled trials are too expensive and infeasible to do for every important clinical question. Despite the promise of this field, its progress has been compromised by several major limitations, one of which is the problem of missing data.

[This is a request for data access via Vivli. See attachment for full text.]"
  ["project_learn_source"]=>
  string(5) "other"
  ["project_learn_source_exp"]=>
  string(0) ""
  ["project_key_personnel"]=>
  array(4) {
    [0]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(4) "Shan"
      ["p_pers_l_name"]=>
      string(4) "Wang"
      ["p_pers_degree"]=>
      string(3) "PhD"
      ["p_pers_pr_affil"]=>
      string(27) "University of San Francisco"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(2) "no"
    }
    [1]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(7) "Douglas"
      ["p_pers_l_name"]=>
      string(7) "Arneson"
      ["p_pers_degree"]=>
      string(3) "PhD"
      ["p_pers_pr_affil"]=>
      string(39) "University of California, San Francisco"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(2) "no"
    }
    [2]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(4) "Balu"
      ["p_pers_l_name"]=>
      string(8) "Bhasuran"
      ["p_pers_degree"]=>
      string(3) "PhD"
      ["p_pers_pr_affil"]=>
      string(39) "University of California, San Francisco"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(2) "no"
    }
    [3]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(7) "Vignesh"
      ["p_pers_l_name"]=>
      string(12) "Ravindranath"
      ["p_pers_degree"]=>
      string(27) "BA / BS / BSc MA / MS / MSc"
      ["p_pers_pr_affil"]=>
      string(39) "University of California, San Francisco"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(2) "no"
    }
  }
  ["project_ext_grants"]=>
  array(2) {
    ["value"]=>
    string(3) "yes"
    ["label"]=>
    string(65) "External grants or funds are being used to support this research."
  }
  ["project_funding_source"]=>
  string(43) "Government Funding - NIH NCATS TL1 TR001871"
  ["project_assoc_trials"]=>
  array(6) {
    [0]=>
    object(WP_Post)#5546 (24) {
      ["ID"]=>
      int(1114)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2014-09-22 10:31:00"
      ["post_date_gmt"]=>
      string(19) "2014-09-22 10:31:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(159) "NCT00036439 - A Randomized, Placebo-controlled, Double-blind Trial to Evaluate the Safety and Efficacy of Infliximab in Patients With Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(155) "nct00036439-a-randomized-placebo-controlled-double-blind-trial-to-evaluate-the-safety-and-efficacy-of-infliximab-in-patients-with-active-ulcerative-colitis"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-07-30 10:12:17"
      ["post_modified_gmt"]=>
      string(19) "2025-07-30 14:12:17"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(204) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00036439-a-randomized-placebo-controlled-double-blind-trial-to-evaluate-the-safety-and-efficacy-of-infliximab-in-patients-with-active-ulcerative-colitis/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [1]=>
    object(WP_Post)#5545 (24) {
      ["ID"]=>
      int(1117)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2014-09-22 10:36:00"
      ["post_date_gmt"]=>
      string(19) "2014-09-22 10:36:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(159) "NCT00096655 - A Randomized, Placebo-controlled, Double-blind Trial to Evaluate the Safety and Efficacy of Infliximab in Patients With Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(155) "nct00096655-a-randomized-placebo-controlled-double-blind-trial-to-evaluate-the-safety-and-efficacy-of-infliximab-in-patients-with-active-ulcerative-colitis"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-07-30 10:13:12"
      ["post_modified_gmt"]=>
      string(19) "2025-07-30 14:13:12"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(204) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00096655-a-randomized-placebo-controlled-double-blind-trial-to-evaluate-the-safety-and-efficacy-of-infliximab-in-patients-with-active-ulcerative-colitis/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [2]=>
    object(WP_Post)#5544 (24) {
      ["ID"]=>
      int(1144)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2014-09-22 11:06:00"
      ["post_date_gmt"]=>
      string(19) "2014-09-22 11:06:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(252) "NCT00487539 - A Phase 2/3 Multicenter, Randomized, Placebo-controlled, Double blind Study to Evaluate the Safety and Efficacy of Golimumab Induction Therapy, Administered Subcutaneously, in Subjects with Moderately to Severely Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(193) "nct00487539-a-phase-2-3-multicenter-randomized-placebo-controlled-double-blind-study-to-evaluate-the-safety-and-efficacy-of-golimumab-induction-therapy-administered-subcutaneously-in-subjects-w"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-10-28 13:33:21"
      ["post_modified_gmt"]=>
      string(19) "2025-10-28 17:33:21"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(242) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00487539-a-phase-2-3-multicenter-randomized-placebo-controlled-double-blind-study-to-evaluate-the-safety-and-efficacy-of-golimumab-induction-therapy-administered-subcutaneously-in-subjects-w/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [3]=>
    object(WP_Post)#5543 (24) {
      ["ID"]=>
      int(1437)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2016-02-25 14:53:00"
      ["post_date_gmt"]=>
      string(19) "2016-02-25 14:53:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(188) "NCT01551290 - A Phase 3, Multicenter, Randomized, Double-Blind, Placebo-Controlled Study Evaluating the Efficacy and Safety of Infliximab in Chinese Subjects With Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(182) "nct01551290-a-phase-3-multicenter-randomized-double-blind-placebo-controlled-study-evaluating-the-efficacy-and-safety-of-infliximab-in-chinese-subjects-with-active-ulcerative-colitis"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-10-28 13:50:18"
      ["post_modified_gmt"]=>
      string(19) "2025-10-28 17:50:18"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(231) "https://dev-yoda.pantheonsite.io/clinical-trial/nct01551290-a-phase-3-multicenter-randomized-double-blind-placebo-controlled-study-evaluating-the-efficacy-and-safety-of-infliximab-in-chinese-subjects-with-active-ulcerative-colitis/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [4]=>
    object(WP_Post)#5542 (24) {
      ["ID"]=>
      int(1583)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2017-01-04 14:53:00"
      ["post_date_gmt"]=>
      string(19) "2017-01-04 14:53:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(252) "NCT00488631 - A Phase 3 Multicenter, Randomized, Placebo-controlled, Double-blind Study to Evaluate the Safety and Efficacy of Golimumab Maintenance Therapy, Administered Subcutaneously, in Subjects With Moderately to Severely Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(193) "nct00488631-a-phase-3-multicenter-randomized-placebo-controlled-double-blind-study-to-evaluate-the-safety-and-efficacy-of-golimumab-maintenance-therapy-administered-subcutaneously-in-subjects-w"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-10-28 13:37:20"
      ["post_modified_gmt"]=>
      string(19) "2025-10-28 17:37:20"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(242) "https://dev-yoda.pantheonsite.io/clinical-trial/nct00488631-a-phase-3-multicenter-randomized-placebo-controlled-double-blind-study-to-evaluate-the-safety-and-efficacy-of-golimumab-maintenance-therapy-administered-subcutaneously-in-subjects-w/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [5]=>
    object(WP_Post)#5541 (24) {
      ["ID"]=>
      int(1715)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2018-08-06 11:00:00"
      ["post_date_gmt"]=>
      string(19) "2018-08-06 11:00:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(134) "NCT01863771 - A Safety and Effectiveness Study of Golimumab in Japanese Patients With Moderately to Severely Active Ulcerative Colitis"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(132) "nct01863771-a-safety-and-effectiveness-study-of-golimumab-in-japanese-patients-with-moderately-to-severely-active-ulcerative-colitis"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-06-25 14:51:23"
      ["post_modified_gmt"]=>
      string(19) "2025-06-25 18:51:23"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(181) "https://dev-yoda.pantheonsite.io/clinical-trial/nct01863771-a-safety-and-effectiveness-study-of-golimumab-in-japanese-patients-with-moderately-to-severely-active-ulcerative-colitis/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
  }
  ["project_date_type"]=>
  string(0) ""
  ["property_scientific_abstract"]=>
  string(1619) "Background: Electronic Health Records (EHR) data are a promising source of information regarding treatment effects in the context of routine clinical care; however, their utility for research has been limited by substantial missing data. Because much of the reason for missing data is related to the availability of other corroborating information about disease activity, and this typically dictates the clinician decision to pursue additional testing and measurement, the 'missing at random' assumption (and therefore, the validity of model-based imputation) appears to be met by EHR data. 

Objective: To develop and evaluate a series of missing data models using datasets with substantial completeness -- RCTs of Ulcerative Colitis -- in order to enable less biased estimation from corresponding EHR studies. 

Study Design: Post-hoc analysis of individual participant data from randomized, blinded Phase 3 trials of adults with Ulcerative Colitis

Participants: Subjects participating in the above trials

Main Outcome: Outcome variables will include each of the subscores of the Mayo Score of Ulcerative Colitis activity. We will develop and evaluate several models of missing data, and perform feature selection to identify the most informative variables for prediction. 

Statistical Analysis: We will artificially censor observations from a complete data set and test a variety of popular predictive models (logistic regression, random forests, gradient boosted decision trees) according to bias and variance. We will use feature selection to identify highly informative variables."
  ["project_brief_bg"]=>
  string(1625) "Following a search of clinicaltrials.gov, we have identified all completed, phase 2-3, randomized controlled trials of FDA-approved therapeutics for Ulcerative colitis in adults. We are requesting participant-level data corresponding to these trials. 

For the primary analysis of this study, we use the Total Mayo Score as the outcome variable. This will be predicted as a function of different combinations of available Mayo subscores and auxiliary variables. We will use nested cross-validation to estimate model accuracy and variance. We will use feature-selection methods to identify reduced models that maintain high predictive accuracy, prioritizing those features that are more convenient to obtain in practice (patient- and physician-reported outcomes > blood tests > stool tests).

Finalized models in the form of as software files will be published at the end of the study; the rationale for this is that ensemble machine learning models have a complex parameterization and thus may not be easily conveyed or transported for real-world use in any other form. In addition, we will also publish the list of features found to be most informative by feature selection.

Strengths of the proposed study include its use of high-quality and complete data to address several important research problems in the field of real-world evidence. Limitations include the possibility that these models may fail to generalize to real-world contexts due to substantive reasons or other modelling problems (model misspecification, overfitting, lack of robustness, undersampling of rare strata in the included data)."
  ["project_specific_aims"]=>
  string(540) "There are two specific aims of this proposed research. 1) Derive and internally validate a series of models for predicting Ulcerative colitis disease activity (Total Mayo Score) in the presence of missing subscore data, and 2) identify combinations of features that are most informative for predicting the Total Mayo Score in the presence of different patterns missing subscores, with an emphasis on a missing endoscopic subscore. Exploratory aims include deriving a new composite disease activity score.

[See attached for full text]"
  ["project_study_design"]=>
  string(0) ""
  ["project_study_design_exp"]=>
  string(0) ""
  ["project_purposes"]=>
  array(0) {
  }
  ["project_purposes_exp"]=>
  string(0) ""
  ["project_software_used"]=>
  array(0) {
  }
  ["project_software_used_exp"]=>
  string(0) ""
  ["project_research_methods"]=>
  string(269) "We are requesting individual participant-level data from all completed RCTs of Ulcerative Colitis in adults based on a search of clinicaltrials.gov. This decision was made in order to maximize the generalizability of these findings to that of routine clinical practice."
  ["project_main_outcome_measure"]=>
  string(234) "For the primary analysis, the outcome element is the Total Mayo Score, an ordinal variable on a 0-12 scale. This variable has been used to define the primary outcome (typically, a binarization of this variable) of all included trials."
  ["project_main_predictor_indep"]=>
  string(307) "The main independent variables will be different combinations of mayo subscores. Each of these are ordinal variables on a 0-3 scale. Most of the derived models will exclude the mayo endoscopic subscore as a predictor, as this variable is least available in real-world data due to its cost and inconvenience."
  ["project_other_variables_interest"]=>
  string(1066) "Other variables of interest fall into the following 4 categories 1) demographic variables (gender, age, race, ethnicity), 2) disease characteristics (disease duration, disease location, current steroid use, assignment to an active arm or placebo, prior treatment failure, presence of extraintestinal manifestations where available, history of other autoimmune diagnoses where available), 3) biochemistries (hemoglobin, white count, albumin, c-reactive protein, erythrocyte sedimentation rate, fecal calprotectin), and 4) other patient reported outcomes data as available (e.g. IBDQ, SF-36). These variables will each be categorized in their native form (binary, categorical, continuous) for the purposes of modeling. Other variables that we will assess during modeling include 1) an indicator variable for trials that use central reading of endoscopy vs not, 2) an indicator variable for the trial of origin corresponding to each included data point, 3) a variable corresponding to the patient identifier (for models that allow for multiple observations per subject)"
  ["project_stat_analysis_plan"]=>
  string(14) "See attachment"
  ["project_timeline"]=>
  string(137) "Start date: 4/2021

Completion date: 1/2022

Manuscript completion date: 3/2022

Results posted to YODA project: 4/2022"
  ["project_dissemination_plan"]=>
  string(322) "This work will be presented at national Gastroenterology meetings and will be submitted to journals of interest both to the IBD and Gastroenterology community as well as the general clinical research community: JAMA network journals, BMJ, Gastroenterology, American Journal of Gastroenterology, Inflammatory Bowel Diseases"
  ["project_bibliography"]=>
  string(287) "1. Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data for health care. J Clin Invest. 2020;130(2):565?574. doi:10.1172/JCI129197

2. S. van Buuren (2018). Flexible Imputation of Missing Data. Second Edition. CRC/Chapman & Hall, FL: Boca Raton
"
  ["project_suppl_material"]=>
  bool(false)
  ["project_coi"]=>
  array(5) {
    [0]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(10017)
        ["id"]=>
        int(10017)
        ["title"]=>
        string(51) "yoda_project_coi_form_for_data_requestors_2019_wang"
        ["filename"]=>
        string(55) "yoda_project_coi_form_for_data_requestors_2019_wang.pdf"
        ["filesize"]=>
        int(151645)
        ["url"]=>
        string(104) "https://yoda.yale.edu/wp-content/uploads/2019/11/yoda_project_coi_form_for_data_requestors_2019_wang.pdf"
        ["link"]=>
        string(97) "https://yoda.yale.edu/data-request/2020-4318/yoda_project_coi_form_for_data_requestors_2019_wang/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1363"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(51) "yoda_project_coi_form_for_data_requestors_2019_wang"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(4999)
        ["date"]=>
        string(19) "2023-07-31 16:07:54"
        ["modified"]=>
        string(19) "2023-08-01 01:04:16"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [1]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(10013)
        ["id"]=>
        int(10013)
        ["title"]=>
        string(50) "yoda_project_coi_form_for_data_requestors_2019_var"
        ["filename"]=>
        string(54) "yoda_project_coi_form_for_data_requestors_2019_var.pdf"
        ["filesize"]=>
        int(95279)
        ["url"]=>
        string(103) "https://yoda.yale.edu/wp-content/uploads/2020/01/yoda_project_coi_form_for_data_requestors_2019_var.pdf"
        ["link"]=>
        string(96) "https://yoda.yale.edu/data-request/2020-4318/yoda_project_coi_form_for_data_requestors_2019_var/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1363"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(50) "yoda_project_coi_form_for_data_requestors_2019_var"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(4999)
        ["date"]=>
        string(19) "2023-07-31 16:07:42"
        ["modified"]=>
        string(19) "2023-08-01 01:04:16"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [2]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(9934)
        ["id"]=>
        int(9934)
        ["title"]=>
        string(55) "yoda_project_coi_form_for_data_requestors_2019_dasigned"
        ["filename"]=>
        string(59) "yoda_project_coi_form_for_data_requestors_2019_dasigned.pdf"
        ["filesize"]=>
        int(137860)
        ["url"]=>
        string(108) "https://yoda.yale.edu/wp-content/uploads/2017/02/yoda_project_coi_form_for_data_requestors_2019_dasigned.pdf"
        ["link"]=>
        string(101) "https://yoda.yale.edu/data-request/2020-4318/yoda_project_coi_form_for_data_requestors_2019_dasigned/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1363"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(55) "yoda_project_coi_form_for_data_requestors_2019_dasigned"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(4999)
        ["date"]=>
        string(19) "2023-07-31 16:04:04"
        ["modified"]=>
        string(19) "2023-08-01 01:04:16"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [3]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(8797)
        ["id"]=>
        int(8797)
        ["title"]=>
        string(11) "coi_form_bb"
        ["filename"]=>
        string(15) "coi_form_bb.pdf"
        ["filesize"]=>
        int(742945)
        ["url"]=>
        string(64) "https://yoda.yale.edu/wp-content/uploads/2019/10/coi_form_bb.pdf"
        ["link"]=>
        string(57) "https://yoda.yale.edu/data-request/2020-4318/coi_form_bb/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1363"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(11) "coi_form_bb"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(4999)
        ["date"]=>
        string(19) "2023-07-31 15:10:33"
        ["modified"]=>
        string(19) "2023-08-01 01:04:16"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [4]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(8912)
        ["id"]=>
        int(8912)
        ["title"]=>
        string(11) "coi_form_vr"
        ["filename"]=>
        string(15) "coi_form_vr.pdf"
        ["filesize"]=>
        int(19994)
        ["url"]=>
        string(64) "https://yoda.yale.edu/wp-content/uploads/2020/08/coi_form_vr.pdf"
        ["link"]=>
        string(57) "https://yoda.yale.edu/data-request/2020-4318/coi_form_vr/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "1363"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(11) "coi_form_vr"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(4999)
        ["date"]=>
        string(19) "2023-07-31 15:16:12"
        ["modified"]=>
        string(19) "2023-08-01 01:04:16"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
  }
  ["data_use_agreement_training"]=>
  bool(true)
  ["certification"]=>
  bool(true)
  ["project_send_email_updates"]=>
  bool(true)
  ["project_status"]=>
  string(15) "unknown_revoked"
  ["project_publ_available"]=>
  bool(true)
  ["project_year_access"]=>
  string(4) "2021"
  ["project_rep_publ"]=>
  array(1) {
    [0]=>
    array(2) {
      ["publication_link"]=>
      array(3) {
        ["title"]=>
        string(80) "Unknown; data access revoked, investigator has not reported results as requested"
        ["url"]=>
        string(87) "http://Unknown; data access revoked, investigator has not reported results as requested"
        ["target"]=>
        string(0) ""
      }
      ["publication_doi"]=>
      string(0) ""
    }
  }
  ["project_assoc_data"]=>
  array(0) {
  }
  ["project_due_dil_assessment"]=>
  array(21) {
    ["ID"]=>
    int(16656)
    ["id"]=>
    int(16656)
    ["title"]=>
    string(55) "YODA Project Due Diligence Assessment 2020-4318_Updated"
    ["filename"]=>
    string(59) "YODA-Project-Due-Diligence-Assessment-2020-4318_Updated.pdf"
    ["filesize"]=>
    int(132873)
    ["url"]=>
    string(108) "https://yoda.yale.edu/wp-content/uploads/2020/05/YODA-Project-Due-Diligence-Assessment-2020-4318_Updated.pdf"
    ["link"]=>
    string(101) "https://yoda.yale.edu/data-request/2020-4318/yoda-project-due-diligence-assessment-2020-4318_updated/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1885"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(55) "yoda-project-due-diligence-assessment-2020-4318_updated"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(4999)
    ["date"]=>
    string(19) "2025-02-13 21:20:00"
    ["modified"]=>
    string(19) "2025-02-13 21:20:00"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_title_link"]=>
  array(21) {
    ["ID"]=>
    int(10953)
    ["id"]=>
    int(10953)
    ["title"]=>
    string(42) "yoda_project_protocol_2020-4318_-_21-04-26"
    ["filename"]=>
    string(46) "yoda_project_protocol_2020-4318_-_21-04-26.pdf"
    ["filesize"]=>
    int(24513)
    ["url"]=>
    string(95) "https://yoda.yale.edu/wp-content/uploads/2023/08/yoda_project_protocol_2020-4318_-_21-04-26.pdf"
    ["link"]=>
    string(88) "https://yoda.yale.edu/data-request/2020-4318/yoda_project_protocol_2020-4318_-_21-04-26/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1363"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(42) "yoda_project_protocol_2020-4318_-_21-04-26"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(4999)
    ["date"]=>
    string(19) "2023-08-09 17:18:42"
    ["modified"]=>
    string(19) "2023-08-09 19:17:32"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_review_link"]=>
  array(21) {
    ["ID"]=>
    int(10764)
    ["id"]=>
    int(10764)
    ["title"]=>
    string(36) "yoda_project_review_-_2020-4318_site"
    ["filename"]=>
    string(40) "yoda_project_review_-_2020-4318_site.pdf"
    ["filesize"]=>
    int(1202083)
    ["url"]=>
    string(89) "https://yoda.yale.edu/wp-content/uploads/2023/08/yoda_project_review_-_2020-4318_site.pdf"
    ["link"]=>
    string(82) "https://yoda.yale.edu/data-request/2020-4318/yoda_project_review_-_2020-4318_site/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1363"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(36) "yoda_project_review_-_2020-4318_site"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(4999)
    ["date"]=>
    string(19) "2023-08-09 17:11:02"
    ["modified"]=>
    string(19) "2023-08-09 19:17:33"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_highlight_button"]=>
  string(0) ""
  ["request_data_partner"]=>
  string(15) "johnson-johnson"
  ["search_order"]=>
  string(5) "-7270"
  ["principal_investigator"]=>
  array(7) {
    ["first_name"]=>
    string(5) "Vivek"
    ["last_name"]=>
    string(10) "Rudrapatna"
    ["degree"]=>
    string(7) "MD, PhD"
    ["primary_affiliation"]=>
    string(39) "University of California, San Francisco"
    ["email"]=>
    string(17) "vivical@gmail.com"
    ["state_or_province"]=>
    string(2) "CA"
    ["country"]=>
    string(13) "United States"
  }
  ["human_research_protection_training"]=>
  bool(false)
  ["request_overridden_res"]=>
  string(1) "3"
}
data partner
array(1) {
  [0]=>
  string(15) "johnson-johnson"
}


pi country
array(1) {
  [0]=>
  string(13) "United States"
}


pi affil
array(1) {
  [0]=>
  string(8) "Academia"
}


products
array(2) {
  [0]=>
  string(8) "remicade"
  [1]=>
  string(7) "simponi"
}


num of trials
array(1) {
  [0]=>
  string(1) "6"
}


res
array(1) {
  [0]=>
  string(1) "3"
}

General Information

How did you learn about the YODA Project?: Other

Conflict of Interest

Request Clinical Trials

Associated Trial(s):

What type of data are you looking for?:

Request Clinical Trials

Data Request Status

Status: Unknown - Revoked

Research Proposal

Project Title: Enhancing inference from real-world data using externally-derived missing data models: a pilot study of Ulcerative Colitis

Scientific Abstract: Background: Electronic Health Records (EHR) data are a promising source of information regarding treatment effects in the context of routine clinical care; however, their utility for research has been limited by substantial missing data. Because much of the reason for missing data is related to the availability of other corroborating information about disease activity, and this typically dictates the clinician decision to pursue additional testing and measurement, the 'missing at random' assumption (and therefore, the validity of model-based imputation) appears to be met by EHR data.
Objective: To develop and evaluate a series of missing data models using datasets with substantial completeness -- RCTs of Ulcerative Colitis -- in order to enable less biased estimation from corresponding EHR studies.
Study Design: Post-hoc analysis of individual participant data from randomized, blinded Phase 3 trials of adults with Ulcerative Colitis
Participants: Subjects participating in the above trials
Main Outcome: Outcome variables will include each of the subscores of the Mayo Score of Ulcerative Colitis activity. We will develop and evaluate several models of missing data, and perform feature selection to identify the most informative variables for prediction.
Statistical Analysis: We will artificially censor observations from a complete data set and test a variety of popular predictive models (logistic regression, random forests, gradient boosted decision trees) according to bias and variance. We will use feature selection to identify highly informative variables.

Brief Project Background and Statement of Project Significance: Following a search of clinicaltrials.gov, we have identified all completed, phase 2-3, randomized controlled trials of FDA-approved therapeutics for Ulcerative colitis in adults. We are requesting participant-level data corresponding to these trials.
For the primary analysis of this study, we use the Total Mayo Score as the outcome variable. This will be predicted as a function of different combinations of available Mayo subscores and auxiliary variables. We will use nested cross-validation to estimate model accuracy and variance. We will use feature-selection methods to identify reduced models that maintain high predictive accuracy, prioritizing those features that are more convenient to obtain in practice (patient- and physician-reported outcomes > blood tests > stool tests).
Finalized models in the form of as software files will be published at the end of the study; the rationale for this is that ensemble machine learning models have a complex parameterization and thus may not be easily conveyed or transported for real-world use in any other form. In addition, we will also publish the list of features found to be most informative by feature selection.
Strengths of the proposed study include its use of high-quality and complete data to address several important research problems in the field of real-world evidence. Limitations include the possibility that these models may fail to generalize to real-world contexts due to substantive reasons or other modelling problems (model misspecification, overfitting, lack of robustness, undersampling of rare strata in the included data).

Specific Aims of the Project: There are two specific aims of this proposed research. 1) Derive and internally validate a series of models for predicting Ulcerative colitis disease activity (Total Mayo Score) in the presence of missing subscore data, and 2) identify combinations of features that are most informative for predicting the Total Mayo Score in the presence of different patterns missing subscores, with an emphasis on a missing endoscopic subscore. Exploratory aims include deriving a new composite disease activity score.
[See attached for full text]

Study Design:

What is the purpose of the analysis being proposed? Please select all that apply.:

Software Used:

Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: We are requesting individual participant-level data from all completed RCTs of Ulcerative Colitis in adults based on a search of clinicaltrials.gov. This decision was made in order to maximize the generalizability of these findings to that of routine clinical practice.

Primary and Secondary Outcome Measure(s) and how they will be categorized/defined for your study: For the primary analysis, the outcome element is the Total Mayo Score, an ordinal variable on a 0-12 scale. This variable has been used to define the primary outcome (typically, a binarization of this variable) of all included trials.

Main Predictor/Independent Variable and how it will be categorized/defined for your study: The main independent variables will be different combinations of mayo subscores. Each of these are ordinal variables on a 0-3 scale. Most of the derived models will exclude the mayo endoscopic subscore as a predictor, as this variable is least available in real-world data due to its cost and inconvenience.

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: Other variables of interest fall into the following 4 categories 1) demographic variables (gender, age, race, ethnicity), 2) disease characteristics (disease duration, disease location, current steroid use, assignment to an active arm or placebo, prior treatment failure, presence of extraintestinal manifestations where available, history of other autoimmune diagnoses where available), 3) biochemistries (hemoglobin, white count, albumin, c-reactive protein, erythrocyte sedimentation rate, fecal calprotectin), and 4) other patient reported outcomes data as available (e.g. IBDQ, SF-36). These variables will each be categorized in their native form (binary, categorical, continuous) for the purposes of modeling. Other variables that we will assess during modeling include 1) an indicator variable for trials that use central reading of endoscopy vs not, 2) an indicator variable for the trial of origin corresponding to each included data point, 3) a variable corresponding to the patient identifier (for models that allow for multiple observations per subject)

Statistical Analysis Plan: See attachment

Narrative Summary: Real-world evidence is an emerging research area that proposes to use data from non-experimental settings, such as routine clinical care, to guide better decision making. This field has received growing interest in recent years for a variety of reasons, including the realization that randomized controlled trials are too expensive and infeasible to do for every important clinical question. Despite the promise of this field, its progress has been compromised by several major limitations, one of which is the problem of missing data.
[This is a request for data access via Vivli. See attachment for full text.]

Project Timeline: Start date: 4/2021
Completion date: 1/2022
Manuscript completion date: 3/2022
Results posted to YODA project: 4/2022

Dissemination Plan: This work will be presented at national Gastroenterology meetings and will be submitted to journals of interest both to the IBD and Gastroenterology community as well as the general clinical research community: JAMA network journals, BMJ, Gastroenterology, American Journal of Gastroenterology, Inflammatory Bowel Diseases

Bibliography:

1. Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data for health care. J Clin Invest. 2020;130(2):565?574. doi:10.1172/JCI129197
2. S. van Buuren (2018). Flexible Imputation of Missing Data. Second Edition. CRC/Chapman & Hall, FL: Boca Raton