2025-0684 - The YODA Project

                    array(40) {
  ["project_status"]=>
  string(7) "ongoing"
  ["project_assoc_trials"]=>
  array(2) {
    [0]=>
    object(WP_Post)#5595 (24) {
      ["ID"]=>
      int(1806)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2023-08-05 04:45:19"
      ["post_date_gmt"]=>
      string(19) "2023-08-05 04:45:19"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(195) "NCT01032629 - A Randomized, Multicenter, Double-Blind, Parallel, Placebo-Controlled Study of the Effects of JNJ-28431754 on Cardiovascular Outcomes in Adult Subjects With Type 2 Diabetes Mellitus"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(189) "nct01032629-a-randomized-multicenter-double-blind-parallel-placebo-controlled-study-of-the-effects-of-jnj-28431754-on-cardiovascular-outcomes-in-adult-subjects-with-type-2-diabetes-mellitus"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-05-13 14:18:55"
      ["post_modified_gmt"]=>
      string(19) "2025-05-13 18:18:55"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(238) "https://dev-yoda.pantheonsite.io/clinical-trial/nct01032629-a-randomized-multicenter-double-blind-parallel-placebo-controlled-study-of-the-effects-of-jnj-28431754-on-cardiovascular-outcomes-in-adult-subjects-with-type-2-diabetes-mellitus/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
    [1]=>
    object(WP_Post)#5596 (24) {
      ["ID"]=>
      int(1808)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2019-08-12 15:10:00"
      ["post_date_gmt"]=>
      string(19) "2019-08-12 15:10:00"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(188) "NCT01989754 - A Randomized, Multicenter, Double-Blind, Parallel, Placebo-Controlled Study of the Effects of Canagliflozin on Renal Endpoints in Adult Subjects With Type 2 Diabetes Mellitus"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(182) "nct01989754-a-randomized-multicenter-double-blind-parallel-placebo-controlled-study-of-the-effects-of-canagliflozin-on-renal-endpoints-in-adult-subjects-with-type-2-diabetes-mellitus"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-10-02 10:04:00"
      ["post_modified_gmt"]=>
      string(19) "2025-10-02 14:04:00"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(231) "https://dev-yoda.pantheonsite.io/clinical-trial/nct01989754-a-randomized-multicenter-double-blind-parallel-placebo-controlled-study-of-the-effects-of-canagliflozin-on-renal-endpoints-in-adult-subjects-with-type-2-diabetes-mellitus/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
  }
  ["project_title"]=>
  string(147) "A Semiparametric Bayesian Analysis of Heterogeneous Treatment Effects of Canagliflozin on Glycemic Control and Renal Function in the CANVAS Program"
  ["project_narrative_summary"]=>
  string(772) "In medicine, we know that a treatment may work well on average but does not have the same effect on every patient. This project will use advanced statistical methods to re-analyze data from the major CANVAS Program clinical trials. We propse a new method for causal inference, combining a flexible machine learner with a interpretable linear model. Our goal is to identify the specific characteristics of patients—such as their age, kidney function, or other health markers—who are most likely to get the greatest benefit from the medication canagliflozin. By understanding which patients benefit most, this research could help doctors make more personalized treatment decisions, leading to better health outcomes and ensuring that medicines are used more effectively."
  ["project_learn_source"]=>
  string(12) "scien_public"
  ["principal_investigator"]=>
  array(7) {
    ["first_name"]=>
    string(4) "Mark"
    ["last_name"]=>
    string(11) "van de Wiel"
    ["degree"]=>
    string(3) "PhD"
    ["primary_affiliation"]=>
    string(13) "Amsterdam UMC"
    ["email"]=>
    string(27) "mark.vdwiel@amsterdamumc.nl"
    ["state_or_province"]=>
    string(13) "North-Holland"
    ["country"]=>
    string(15) "The Netherlands"
  }
  ["project_key_personnel"]=>
  array(1) {
    [0]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(8) "Yusufhan"
      ["p_pers_l_name"]=>
      string(5) "Balci"
      ["p_pers_degree"]=>
      string(2) "MS"
      ["p_pers_pr_affil"]=>
      string(13) "Amsterdam UMC"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(3) "yes"
    }
  }
  ["project_ext_grants"]=>
  array(2) {
    ["value"]=>
    string(2) "no"
    ["label"]=>
    string(68) "No external grants or funds are being used to support this research."
  }
  ["project_date_type"]=>
  string(18) "full_crs_supp_docs"
  ["property_scientific_abstract"]=>
  string(1655) "Background

Average treatment effects (ATE) from randomized controlled trials (RCTs) can mask significant treatment effect heterogeneity (TEH), where patient subgroups have different outcomes. The CANVAS Program showed canagliflozin reduces major adverse cardiovascular events. Advanced methods are needed to identify which patients benefit most.



Objective

To apply and develop a semiparametric Bayesian model to estimate the Conditional Average Treatment Effect (CATE) of canagliflozin and identify baseline characteristics that modify treatment response.



Study Design

Secondary analysis of individual participant data from the integrated CANVAS and CANVAS-R randomized trials.



Participants

Participants from the integrated CANVAS and CANVAS-R trials. We will exclude those with missing primary outcome or key baseline covariate data.



Main Outcomes Measures

Primary outcome: Change in Hemoglobin A1c (HbA1c) from baseline to 52 weeks. Secondary outcome: Annual change in estimated glomerular filtration rate (eGFR).



Statistical Analysis

A semiparametric Bayesian model will be used. Prognostic effects will be modeled non-parametrically with Bayesian Additive Regression Trees (BART), including a propensity score for double-robustness. The Conditional Average Treatment Effect (CATE) will be modeled with a parametric linear model using shrinkage priors (e.g., Horseshoe). This hybrid structure maintains interpretability. Shapley values derived from posterior samples will quantify individual variable contributions to the CATE.

"
  ["project_brief_bg"]=>
  string(1753) "The paradigm of evidence-based medicine is shifting from a focus on the average patient to the individual. While RCTs remain the gold standard for establishing treatment efficacy, their focus on the ATE can obscure important variations in how individual patients respond to therapy. A drug that might benefit the patient population on average might have a detrimental effect on a subgroup of patients.



The CANVAS Program, a landmark pair of trials, established that canagliflozin reduces the risk of major adverse cardiovascular events and has potential renoprotective effects in patients with type 2 diabetes at high cardiovascular risk. Understanding the heterogeneity of this effect is essential for advancing precision medicine and ensuring that therapies are targeted to those most likely to benefit.



Traditional approaches to investigating TEH are often statistically fragile. This project proposes a semiparametric Bayesian modeling approach that leverages the strengths of both non-parametric and parametric methods. We will model the complex prognostic effects of baseline covariates using Bayesian Additive Regression Trees (BART), while modeling the CATE with an interpretable linear model that includes interaction terms. This combination can yield actionable insights for personalizing treatment decisions. Furthermore, this specific model structure lends itself to the efficient computation of Shapley values directly from posterior samples, allowing for a clear explanation of both global effect modification and individual-level predictions without refitting the model. This research will contribute to the development of more personalized treatment strategies, leading to improved patient outcomes.

"
  ["project_specific_aims"]=>
  string(571) "To apply a semiparametric Bayesian model to individual participant data from the CANVAS Program to estimate the Conditional Average Treatment Effect (CATE) of canagliflozin on glycemic control (change in HbA1c) and renal function progression (change in eGFR). To identify key baseline patient characteristics and their interactions that are important modifiers of the treatment effect. To characterize the profiles of patient subgroups predicted to derive the largest and smallest benefit from the intervention by using Shapley values to interpret the CATE model.

"
  ["project_study_design"]=>
  array(2) {
    ["value"]=>
    string(8) "meth_res"
    ["label"]=>
    string(23) "Methodological research"
  }
  ["project_purposes"]=>
  array(1) {
    [0]=>
    array(2) {
      ["value"]=>
      string(37) "develop_or_refine_statistical_methods"
      ["label"]=>
      string(37) "Develop or refine statistical methods"
    }
  }
  ["project_research_methods"]=>
  string(550) "The data source will be the integrated individual participant-level dataset from the CANagliflozin cardioVascular Assessment Study (CANVAS; NCT01032629) and the CANVAS-Renal (CANVAS-R; NCT01989754) trials.



Exclusion Criteria: Key trial exclusion criteria included a history of diabetic ketoacidosis or type 1 diabetes. For this specific analysis, we will subsequently exclude participants with missing data for the primary outcome measure at the 52-week follow-up visit and participants with missing baseline values for key covariates."
  ["project_main_outcome_measure"]=>
  string(309) "Primary Outcome: Our main outcome of interest is the change in Hemoglobin A1c (HbA1c) from baseline to 52 weeks, analyzed as a continuous variable.

Secondary Outcome: A key secondary outcome is the annual rate of change in estimated glomerular filtration rate (eGFR), analyzed as a continuous variable."
  ["project_main_predictor_indep"]=>
  string(448) "The main independent variable is the randomized treatment assignment to canagliflozin versus placebo. In CANVAS, participants were randomized 1:1:1 to placebo, canagliflozin 100 mg, or canagliflozin 300 mg. In CANVAS-R, participants were randomized 1:1 to placebo or canagliflozin 100 mg (with an option to uptitrate to 300 mg). For this analysis, the variable will be categorized as a binary predictor (any canagliflozin dose vs. placebo).

"
  ["project_other_variables_interest"]=>
  string(661) "We will evaluate a set of pre-specified baseline characteristics as potential modifiers of the treatment effect, including but not limited to:

Age (continuous)

Sex (binary)

Race (categorical)

Body Mass Index (BMI) (continuous)

Duration of diabetes (continuous)

History of heart failure (binary)

History of atherosclerotic cardiovascular disease (binary)

Baseline systolic blood pressure (continuous)

Baseline use of concomitant medications (e.g., insulin, metformin, RAAS inhibitors) (binary)

Pulse (continuous)

Creatine (continuous)

SGLT2 inhibitor group (categorical)

"
  ["project_stat_analysis_plan"]=>
  string(1792) "The analysis will be conducted within the potential outcomes framework. We will fit a semiparametric model for the conditional expectation of the outcome Y given a vector of baseline covariates X and a binary treatment indicator T. The model is specified as:

y_i = μ(x_i, π_hat(x_i)) + T_i * τ(x_i) + ε_i

The prognostic effect, μ(X, π_hat(X)), will be modeled non-parametrically using Bayesian Additive Regression Trees (BART). This component includes the fitted propensity score, π_hat(X), to ensure double-robustness, correcting for potential model misspecification and mitigating bias induced by regularization.

The Conditional Average Treatment Effect (CATE), τ(x_i), will be modeled with a parametric linear function that includes both main effects of covariates and pre-specified interaction terms. To regularize the linear CATE component, we will employ advanced shrinkage priors, such as the Horseshoe prior or a hierarchical Linked Shrinkage prior. The model will be fit using a Markov Chain Monte Carlo (MCMC) algorithm to obtain the full posterior distribution for the coefficients.

To interpret the model, we will use Shapley values to attribute the predicted CATE for each individual to their baseline features. Because the CATE component of our model is linear, the Shapley value contributions have a closed-form solution and are computationally efficient to calculate directly from the posterior samples without refitting the model. This will allow us to create descriptive profiles of "hyper-responders" and "hypo-responders" to guide clinical intuition.



All analyses will be conducted using the R statistical programming environment. The core modeling will be performed using the established package stochtree for R.

"
  ["project_software_used"]=>
  array(2) {
    [0]=>
    array(2) {
      ["value"]=>
      string(1) "r"
      ["label"]=>
      string(1) "R"
    }
    [1]=>
    array(2) {
      ["value"]=>
      string(7) "rstudio"
      ["label"]=>
      string(7) "RStudio"
    }
  }
  ["project_timeline"]=>
  string(609) "Month 1: Data acquisition, data inspection, and preparation of the final analytical dataset.

Months 2-3: Primary statistical analysis, including fitting the semiparametric Bayesian model, performing convergence diagnostics, and secondary analyses including Shapley value calculations and patient subgroup characterization.

Months 4-5: Interpretation of results and drafting of the primary manuscript.

Month 5: Submission of the manuscript to a peer-reviewed journal.

Month 6: Preparation of abstracts for scientific conferences and submission of a final report to the YODA Project."
  ["project_dissemination_plan"]=>
  string(210) "The results of this research are planned for publication in a leading, peer-reviewed medical or statistical journal. Findings will also be presented at national and international scientific conferences.

"
  ["project_bibliography"]=>
  string(5488) "
Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353-7360.
Baiardi, A. and Naghi, A. A. (2024). The value added of machine learning to causal inference: Evidence from revisited studies. The Econometrics Journal, 27(2):213-234.
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465-480.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. (2017). Double/debiased/neyman machine learning of treatment effects. American Economic Review, 107(5), 261-65.
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266-298.
Hahn, P. R., Carvalho, C. M., Puelz, D., and He, J. (2018). Regularization and confounding in linear regression for treatment effect estimation.
Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. Bayesian Analysis, 15(3), 965-1056.
Li, F., Ding, P., and Mealli, F. (2023). Bayesian causal inference: a critical review. Philosophical Transactions of the Royal Society A, 381(2247):20220153.
Linero, A. R. (2024). In nonparametric and high-dimensional models, bayesian ignorability is an informative prior. Journal of the American Statistical Association, 119(548):2785-2798.
Makalic, E. and Schmidt, D. F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179-182.
Nie, X., & Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299-319.
Schuler, M. S. and Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in observational studies. American journal of epidemiology, 185(1):65-73.
Tian, L., Alizadeh, A. A., Gentles, A. J., and Tibshirani, R. (2014). A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508):1517-1532.
van de Wiel, M. A., Amestoy, M., & Hoogland, J. (2024). Linked shrinkage to improve estimation of interaction effects in regression models. Epidemiologic Methods, 13(1), 20230039.
Van Der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The international journal of biostatistics, 2(1).
Vansteelandt, S. and Dukes, O. (2022). Assumption-lean inference for generalised linear model parameters. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3):657-685.
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.

"
  ["project_suppl_material"]=>
  bool(false)
  ["project_coi"]=>
  array(2) {
    [0]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(17998)
        ["id"]=>
        int(17998)
        ["title"]=>
        string(40) "SV_57KskaKADT3U9Aq-R_2E4d3jKBNqaI0xm.pdf"
        ["filename"]=>
        string(40) "SV_57KskaKADT3U9Aq-R_2E4d3jKBNqaI0xm.pdf"
        ["filesize"]=>
        int(20098)
        ["url"]=>
        string(89) "https://yoda.yale.edu/wp-content/uploads/2025/10/SV_57KskaKADT3U9Aq-R_2E4d3jKBNqaI0xm.pdf"
        ["link"]=>
        string(86) "https://yoda.yale.edu/data-request/2025-0684/sv_57kskakadt3u9aq-r_2e4d3jkbnqai0xm-pdf/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "2220"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(40) "sv_57kskakadt3u9aq-r_2e4d3jkbnqai0xm-pdf"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(17981)
        ["date"]=>
        string(19) "2025-10-02 12:39:45"
        ["modified"]=>
        string(19) "2025-10-02 12:39:48"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
    [1]=>
    array(1) {
      ["file_coi"]=>
      array(21) {
        ["ID"]=>
        int(17999)
        ["id"]=>
        int(17999)
        ["title"]=>
        string(40) "SV_57KskaKADT3U9Aq-R_2yfFvF2WCG3Uhzz.pdf"
        ["filename"]=>
        string(40) "SV_57KskaKADT3U9Aq-R_2yfFvF2WCG3Uhzz.pdf"
        ["filesize"]=>
        int(19871)
        ["url"]=>
        string(89) "https://yoda.yale.edu/wp-content/uploads/2025/10/SV_57KskaKADT3U9Aq-R_2yfFvF2WCG3Uhzz.pdf"
        ["link"]=>
        string(86) "https://yoda.yale.edu/data-request/2025-0684/sv_57kskakadt3u9aq-r_2yffvf2wcg3uhzz-pdf/"
        ["alt"]=>
        string(0) ""
        ["author"]=>
        string(4) "2220"
        ["description"]=>
        string(0) ""
        ["caption"]=>
        string(0) ""
        ["name"]=>
        string(40) "sv_57kskakadt3u9aq-r_2yffvf2wcg3uhzz-pdf"
        ["status"]=>
        string(7) "inherit"
        ["uploaded_to"]=>
        int(17981)
        ["date"]=>
        string(19) "2025-10-02 12:39:46"
        ["modified"]=>
        string(19) "2025-10-02 12:39:48"
        ["menu_order"]=>
        int(0)
        ["mime_type"]=>
        string(15) "application/pdf"
        ["type"]=>
        string(11) "application"
        ["subtype"]=>
        string(3) "pdf"
        ["icon"]=>
        string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
      }
    }
  }
  ["data_use_agreement_training"]=>
  bool(true)
  ["human_research_protection_training"]=>
  bool(true)
  ["certification"]=>
  bool(true)
  ["request_data_partner"]=>
  string(15) "johnson-johnson"
  ["request_overridden_res"]=>
  string(1) "3"
  ["search_order"]=>
  string(1) "0"
  ["project_send_email_updates"]=>
  bool(false)
  ["project_publ_available"]=>
  bool(true)
  ["project_year_access"]=>
  string(4) "2025"
  ["project_rep_publ"]=>
  bool(false)
  ["project_assoc_data"]=>
  array(0) {
  }
  ["project_due_dil_assessment"]=>
  array(21) {
    ["ID"]=>
    int(18309)
    ["id"]=>
    int(18309)
    ["title"]=>
    string(47) "YODA Project Due Diligence Assessment 2025-0684"
    ["filename"]=>
    string(51) "YODA-Project-Due-Diligence-Assessment-2025-0684.pdf"
    ["filesize"]=>
    int(125916)
    ["url"]=>
    string(100) "https://yoda.yale.edu/wp-content/uploads/2025/09/YODA-Project-Due-Diligence-Assessment-2025-0684.pdf"
    ["link"]=>
    string(93) "https://yoda.yale.edu/data-request/2025-0684/yoda-project-due-diligence-assessment-2025-0684/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1885"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(47) "yoda-project-due-diligence-assessment-2025-0684"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(17981)
    ["date"]=>
    string(19) "2025-11-13 18:42:09"
    ["modified"]=>
    string(19) "2025-11-13 18:42:09"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_title_link"]=>
  array(21) {
    ["ID"]=>
    int(18310)
    ["id"]=>
    int(18310)
    ["title"]=>
    string(46) "YODA Project Protocol - 2025-0684 - 2025-10-02"
    ["filename"]=>
    string(46) "YODA-Project-Protocol-2025-0684-2025-10-02.pdf"
    ["filesize"]=>
    int(187078)
    ["url"]=>
    string(95) "https://yoda.yale.edu/wp-content/uploads/2025/09/YODA-Project-Protocol-2025-0684-2025-10-02.pdf"
    ["link"]=>
    string(88) "https://yoda.yale.edu/data-request/2025-0684/yoda-project-protocol-2025-0684-2025-10-02/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1885"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(42) "yoda-project-protocol-2025-0684-2025-10-02"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(17981)
    ["date"]=>
    string(19) "2025-11-13 18:42:29"
    ["modified"]=>
    string(19) "2025-11-13 18:42:29"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_review_link"]=>
  array(21) {
    ["ID"]=>
    int(18311)
    ["id"]=>
    int(18311)
    ["title"]=>
    string(36) "YODA Project Review - 2025-0684_site"
    ["filename"]=>
    string(38) "YODA-Project-Review-2025-0684_site.pdf"
    ["filesize"]=>
    int(1315649)
    ["url"]=>
    string(87) "https://yoda.yale.edu/wp-content/uploads/2025/09/YODA-Project-Review-2025-0684_site.pdf"
    ["link"]=>
    string(80) "https://yoda.yale.edu/data-request/2025-0684/yoda-project-review-2025-0684_site/"
    ["alt"]=>
    string(0) ""
    ["author"]=>
    string(4) "1885"
    ["description"]=>
    string(0) ""
    ["caption"]=>
    string(0) ""
    ["name"]=>
    string(34) "yoda-project-review-2025-0684_site"
    ["status"]=>
    string(7) "inherit"
    ["uploaded_to"]=>
    int(17981)
    ["date"]=>
    string(19) "2025-11-13 18:42:53"
    ["modified"]=>
    string(19) "2025-11-13 18:42:53"
    ["menu_order"]=>
    int(0)
    ["mime_type"]=>
    string(15) "application/pdf"
    ["type"]=>
    string(11) "application"
    ["subtype"]=>
    string(3) "pdf"
    ["icon"]=>
    string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png"
  }
  ["project_highlight_button"]=>
  string(0) ""
}
data partner
array(1) {
  [0]=>
  string(15) "johnson-johnson"
}


pi country
array(0) {
}


pi affil
array(0) {
}


products
array(1) {
  [0]=>
  string(8) "invokana"
}


num of trials
array(1) {
  [0]=>
  string(1) "2"
}


res
array(1) {
  [0]=>
  string(1) "3"
}

General Information

How did you learn about the YODA Project?: Scientific Publication

Conflict of Interest

Request Clinical Trials

Associated Trial(s):

What type of data are you looking for?: Individual Participant-Level Data, which includes Full CSR and all supporting documentation

Request Clinical Trials

Data Request Status

Status: Ongoing

Research Proposal

Project Title: A Semiparametric Bayesian Analysis of Heterogeneous Treatment Effects of Canagliflozin on Glycemic Control and Renal Function in the CANVAS Program

Scientific Abstract: Background
Average treatment effects (ATE) from randomized controlled trials (RCTs) can mask significant treatment effect heterogeneity (TEH), where patient subgroups have different outcomes. The CANVAS Program showed canagliflozin reduces major adverse cardiovascular events. Advanced methods are needed to identify which patients benefit most.

Objective
To apply and develop a semiparametric Bayesian model to estimate the Conditional Average Treatment Effect (CATE) of canagliflozin and identify baseline characteristics that modify treatment response.

Study Design
Secondary analysis of individual participant data from the integrated CANVAS and CANVAS-R randomized trials.

Participants
Participants from the integrated CANVAS and CANVAS-R trials. We will exclude those with missing primary outcome or key baseline covariate data.

Main Outcomes Measures
Primary outcome: Change in Hemoglobin A1c (HbA1c) from baseline to 52 weeks. Secondary outcome: Annual change in estimated glomerular filtration rate (eGFR).

Statistical Analysis
A semiparametric Bayesian model will be used. Prognostic effects will be modeled non-parametrically with Bayesian Additive Regression Trees (BART), including a propensity score for double-robustness. The Conditional Average Treatment Effect (CATE) will be modeled with a parametric linear model using shrinkage priors (e.g., Horseshoe). This hybrid structure maintains interpretability. Shapley values derived from posterior samples will quantify individual variable contributions to the CATE.

Brief Project Background and Statement of Project Significance: The paradigm of evidence-based medicine is shifting from a focus on the average patient to the individual. While RCTs remain the gold standard for establishing treatment efficacy, their focus on the ATE can obscure important variations in how individual patients respond to therapy. A drug that might benefit the patient population on average might have a detrimental effect on a subgroup of patients.

The CANVAS Program, a landmark pair of trials, established that canagliflozin reduces the risk of major adverse cardiovascular events and has potential renoprotective effects in patients with type 2 diabetes at high cardiovascular risk. Understanding the heterogeneity of this effect is essential for advancing precision medicine and ensuring that therapies are targeted to those most likely to benefit.

Traditional approaches to investigating TEH are often statistically fragile. This project proposes a semiparametric Bayesian modeling approach that leverages the strengths of both non-parametric and parametric methods. We will model the complex prognostic effects of baseline covariates using Bayesian Additive Regression Trees (BART), while modeling the CATE with an interpretable linear model that includes interaction terms. This combination can yield actionable insights for personalizing treatment decisions. Furthermore, this specific model structure lends itself to the efficient computation of Shapley values directly from posterior samples, allowing for a clear explanation of both global effect modification and individual-level predictions without refitting the model. This research will contribute to the development of more personalized treatment strategies, leading to improved patient outcomes.

Specific Aims of the Project: To apply a semiparametric Bayesian model to individual participant data from the CANVAS Program to estimate the Conditional Average Treatment Effect (CATE) of canagliflozin on glycemic control (change in HbA1c) and renal function progression (change in eGFR). To identify key baseline patient characteristics and their interactions that are important modifiers of the treatment effect. To characterize the profiles of patient subgroups predicted to derive the largest and smallest benefit from the intervention by using Shapley values to interpret the CATE model.

Study Design: Methodological research

What is the purpose of the analysis being proposed? Please select all that apply.: Develop or refine statistical methods

Software Used: R, RStudio

Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: The data source will be the integrated individual participant-level dataset from the CANagliflozin cardioVascular Assessment Study (CANVAS; NCT01032629) and the CANVAS-Renal (CANVAS-R; NCT01989754) trials.

Exclusion Criteria: Key trial exclusion criteria included a history of diabetic ketoacidosis or type 1 diabetes. For this specific analysis, we will subsequently exclude participants with missing data for the primary outcome measure at the 52-week follow-up visit and participants with missing baseline values for key covariates.

Primary and Secondary Outcome Measure(s) and how they will be categorized/defined for your study: Primary Outcome: Our main outcome of interest is the change in Hemoglobin A1c (HbA1c) from baseline to 52 weeks, analyzed as a continuous variable.
Secondary Outcome: A key secondary outcome is the annual rate of change in estimated glomerular filtration rate (eGFR), analyzed as a continuous variable.

Main Predictor/Independent Variable and how it will be categorized/defined for your study: The main independent variable is the randomized treatment assignment to canagliflozin versus placebo. In CANVAS, participants were randomized 1:1:1 to placebo, canagliflozin 100 mg, or canagliflozin 300 mg. In CANVAS-R, participants were randomized 1:1 to placebo or canagliflozin 100 mg (with an option to uptitrate to 300 mg). For this analysis, the variable will be categorized as a binary predictor (any canagliflozin dose vs. placebo).

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: We will evaluate a set of pre-specified baseline characteristics as potential modifiers of the treatment effect, including but not limited to:
Age (continuous)
Sex (binary)
Race (categorical)
Body Mass Index (BMI) (continuous)
Duration of diabetes (continuous)
History of heart failure (binary)
History of atherosclerotic cardiovascular disease (binary)
Baseline systolic blood pressure (continuous)
Baseline use of concomitant medications (e.g., insulin, metformin, RAAS inhibitors) (binary)
Pulse (continuous)
Creatine (continuous)
SGLT2 inhibitor group (categorical)

Statistical Analysis Plan: The analysis will be conducted within the potential outcomes framework. We will fit a semiparametric model for the conditional expectation of the outcome Y given a vector of baseline covariates X and a binary treatment indicator T. The model is specified as:
y_i = μ(x_i, π_hat(x_i)) + T_i * τ(x_i) + ε_i
The prognostic effect, μ(X, π_hat(X)), will be modeled non-parametrically using Bayesian Additive Regression Trees (BART). This component includes the fitted propensity score, π_hat(X), to ensure double-robustness, correcting for potential model misspecification and mitigating bias induced by regularization.
The Conditional Average Treatment Effect (CATE), τ(x_i), will be modeled with a parametric linear function that includes both main effects of covariates and pre-specified interaction terms. To regularize the linear CATE component, we will employ advanced shrinkage priors, such as the Horseshoe prior or a hierarchical Linked Shrinkage prior. The model will be fit using a Markov Chain Monte Carlo (MCMC) algorithm to obtain the full posterior distribution for the coefficients.
To interpret the model, we will use Shapley values to attribute the predicted CATE for each individual to their baseline features. Because the CATE component of our model is linear, the Shapley value contributions have a closed-form solution and are computationally efficient to calculate directly from the posterior samples without refitting the model. This will allow us to create descriptive profiles of "hyper-responders" and "hypo-responders" to guide clinical intuition.

All analyses will be conducted using the R statistical programming environment. The core modeling will be performed using the established package stochtree for R.

Narrative Summary: In medicine, we know that a treatment may work well on average but does not have the same effect on every patient. This project will use advanced statistical methods to re-analyze data from the major CANVAS Program clinical trials. We propse a new method for causal inference, combining a flexible machine learner with a interpretable linear model. Our goal is to identify the specific characteristics of patients--such as their age, kidney function, or other health markers--who are most likely to get the greatest benefit from the medication canagliflozin. By understanding which patients benefit most, this research could help doctors make more personalized treatment decisions, leading to better health outcomes and ensuring that medicines are used more effectively.

Project Timeline: Month 1: Data acquisition, data inspection, and preparation of the final analytical dataset.
Months 2-3: Primary statistical analysis, including fitting the semiparametric Bayesian model, performing convergence diagnostics, and secondary analyses including Shapley value calculations and patient subgroup characterization.
Months 4-5: Interpretation of results and drafting of the primary manuscript.
Month 5: Submission of the manuscript to a peer-reviewed journal.
Month 6: Preparation of abstracts for scientific conferences and submission of a final report to the YODA Project.

Dissemination Plan: The results of this research are planned for publication in a leading, peer-reviewed medical or statistical journal. Findings will also be presented at national and international scientific conferences.

Bibliography:

Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353-7360.
Baiardi, A. and Naghi, A. A. (2024). The value added of machine learning to causal inference: Evidence from revisited studies. The Econometrics Journal, 27(2):213-234.
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465-480.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. (2017). Double/debiased/neyman machine learning of treatment effects. American Economic Review, 107(5), 261-65.
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266-298.
Hahn, P. R., Carvalho, C. M., Puelz, D., and He, J. (2018). Regularization and confounding in linear regression for treatment effect estimation.
Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. Bayesian Analysis, 15(3), 965-1056.
Li, F., Ding, P., and Mealli, F. (2023). Bayesian causal inference: a critical review. Philosophical Transactions of the Royal Society A, 381(2247):20220153.
Linero, A. R. (2024). In nonparametric and high-dimensional models, bayesian ignorability is an informative prior. Journal of the American Statistical Association, 119(548):2785-2798.
Makalic, E. and Schmidt, D. F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179-182.
Nie, X., & Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299-319.
Schuler, M. S. and Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in observational studies. American journal of epidemiology, 185(1):65-73.
Tian, L., Alizadeh, A. A., Gentles, A. J., and Tibshirani, R. (2014). A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508):1517-1532.
van de Wiel, M. A., Amestoy, M., & Hoogland, J. (2024). Linked shrinkage to improve estimation of interaction effects in regression models. Epidemiologic Methods, 13(1), 20230039.
Van Der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The international journal of biostatistics, 2(1).
Vansteelandt, S. and Dukes, O. (2022). Assumption-lean inference for generalised linear model parameters. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3):657-685.
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.