The overall aim of this study is to promote the use of appropriate statistical methods to inform personalization of medicine. My objective is to explore the relative performance of alternative methods for the estimation of heterogeneous treatment effects in contexts that commonly occur in the real world and to demonstrate their performance in clinical case studies. The research questions (RQ) are as follows:
RQ1: Which Machine learning methods are appropriate for identifying heterogeneous treatment effects in Randomized Controlled Trials?
RQ2: How sensitive are the various Machine Learning approaches to features commonly encountered in clinical settings including (a) small samples, (b) clustering in the trial design, (c) large numbers of potential treatment moderators and (d) alternative outcome types (binary, count, survival time)?
RQ3: In the case studies I consider, how strong is the evidence for heterogeneous treatment effects, what drives this heterogeneity and is it of clinical relevance?
In the literature, a variety of methods have been used to estimate HTE however their relative performance in clinical contexts remains largely unknown. This project will use a combination of Monte Carlo simulations and a real-world application to address this knowledge gap with a view to promoting the uptake of suitable methods for Personalized medicine.
To assess the performance of methods to estimate HTE and to evaluate the effect of canagliflozin compared with glimepiride on outcomes including:
• Blycemic control (HbA1C and fasting plasma glucose [FPG])
•Body weight, waist circumference, and BMI.
•Incidence of hypoglycemia
•Systolic blood pressure (SBP) and diastolic one (DBP)
•Time to receiving rescue therapy or discontinuing due to need for rescue therapy
•Proportion of subjects receiving rescue therapy or discontinuing due to need for rescue therapy through Week 104
•Urinary glucose excretion (UGE)
This study is a randomized, double-blind, 3-arm, parallel-group, active-controlled, multicenter study. We will also simulate data based on the RCT data.
Participants; A total of 1,452 subjects that are randomized to glimepiride, canagliflozin 100 mg, and canagliflozin 300 mg in a 1:1:1 manner.
Main Outcome Measure(s); Change in HbA1c From Baseline to Week 52
We'll primarily use causal forest modelling and BART to identify Heterogeneous Treatment Effects on the actual RCT data. We will conduct a Monte Carlo simulation study to test the performance of the methods in a context where the true HTE is known.
A common approach to HTE analysis is to compare binary groups (such as male vs female), or to interact a treatment identifier with a range of covariates. However, such comparisons make strong assumptions regarding the role of other covariates and the form of effect modification. Few such subgroup effects are corroborated in subsequent studies (Wallach et al 2016; Wallach et al 2017). Kent et al (2018) suggest that many or even most statistically significant subgroup effects represent false discoveries. Kent et al (2018 BMJ) highlight that flexible machine learning (ML) methods may be helpful in this context. I have conducted a preliminary scoping exercise to identify methods that may be useful in this context including regression trees (Su et al, 2009; Athey & Imbens, 2016), Random Forests (Wager & Athey,2018; Athey, Tibshirani, & Wager, 2019), Causal Forests (Athey et al 2018) , the least absolute shrinkage and selection operator (Lasso) (Qian & Murphy, 2011;Tian et al, 2014; Chen et al, 2017), Support Vector Machines (Imai & Ratkovic, 2013), Boosting (Powers et al.,2018), Neural Networks (Johansson et al, 2016; Shalit et al 2016; Schwab et al 2018) and Bayesian Additive Regression Trees (BART) (Hill, 2011; Taddy et al, 2016). It is imperative that strong evidence-based foundations are developed to support clinicians in treatment decision making. This research will advance knowledge through several avenues by: (1) identifying statistical approaches, particularly those using Machine Learning, that can reliably estimate HTEs; (2) exploring their performance in simulation studies designed to reflect real world applications; (3) applying the best-performing methods to our case studies to identify patients that are most likely to benefit from targeted interventions.
Inclusion and exclusion criteria are as used in the RCT and are listed below
Patients must have a diagnosis of type 2 diabetes
Body mass index (BMI) should be between 22 and 45 kg/m2 at screening
Patients must be taking a stable dosage of metformin as monotherapy at screening
Patients must have a HbA1c between >=7% and <=9.5% at Week 2
Patients must have a fasting plasma glucose (FPG) <=270 mg/dL (15 mmol/L) at Week -2
Patients having prior exposure or known contraindication or suspected hypersensitivity to JNJ-28431754, glimepiride, or metformin
History of diabetic ketoacidosis or type 1 diabetes mellitus
History of pancreas or beta-cell transplantation
History of active proliferative diabetic retinopathy
History of hereditary glucose-galactose malabsorption or primary renal glucosuria
Renal disease requiring treatment with immunosuppressive therapy within the past 12 months before screening or a history of dialysis or renal transplant
Taken thiazolidinedione therapy in the past 16 weeks before screening
Personalised medicine to improve population health, requires evidence on how the relative effectiveness and harms of alternative treatments, or treatment regimes (frequency, dosage or combinations of drugs) differ across individual patients. The effect of treatment for particular patients is likely to differ according to their baseline characteristics (such as age, gender, severity of disease) in addition to the treatment regime itself.
I will use flexible data-driven approaches, mainly coming from the Machine Learning literature, to improve the identification and estimation of heterogeneous treatment effects (HTE).
I have started the preparation for my project on 1/11/2020 and It is expected to be completed by September 2021. Therefore, the project timeline would be as follows:
Months 1-3 - Data preparation and design of simulations
Months 4-11 - Analysis and conducting simulations
Months 12-15 - write up of initial analysis.
Months 16-18 - dissemination of results.
The manuscript will be drafted and submitted for publication by 1/1/2021. All manuscripts, abstracts, posters and presentations will be shared with the YODA Project at the time of submission.
My research will lead to at least 2 internationally peer reviewed publications, which will be targeted to leading journals in medical statistics, causal inference and health economics, including Statistics in Medicine, Journal of Causal Inference, Journal of Health Economics, Medical Care and Medical Decision Making, and clinical journals where relevant such as Diabetes, and Diabetes Care. In line with the IRC’s Open Access policy and the National Principles for Open Access Policy Statement (2012), NUIG requires that authors of peer-reviewed articles and peer-reviewed conference papers must deposit a copy in the University’s open access repository ‘ARAN’. This will enhance the use and impact of my PhD research I plan to disseminate my research by presenting at leading national and international conferences such as the Irish Economic Association (IEA) conference, ISPOR Annual International Meeting, American Society of Health Economists,
European Health Economics Association Conference, Society for Medical Decision Making (SMDM) conference, International Health Economics Association (IHEA) conference, European Causal Inference Meeting and the International Conference on Health Policy Statistics conference.
-Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47(2), pp.1148-1178.
-Athey, S., & Imbens, G. W. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353– 7360.
-Chen, S., Tian, L., Cai, T., & Yu, M. (2017). A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics, 73(4), 1199–1209.
-Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217–240.
-Imai, K., & Ratkovic, M. (2013). Estimating treatment effect heterogeneity in randomized program evaluation. Annals of Applied Statistics, 7(1), 443–470.
-Johansson, F., Shalit, U., & Sontag, D. (2016). Learning representations for counterfactual inference. In International Conference on Machine Learning (pp. 3020–3029).
-Knaus, M.C., Lechner, M. and Strittmatter, A., 2018. Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence. Retrieved from http://arxiv.org/abs/1810.13237.
-Nie, X., & Wager, S. (2017). Quasi-oracle estimation of heterogeneous treatment effects. Retrieved from http://arxiv.org/abs/1712.04912.
-Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., & Tibshirani, R. (2018). Some methods for heterogeneous treatment effect estimation in high
dimensions. Statistics in Medicine, 37(11), 1767–1787.
-Qian, M., & Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics, 39(2), 1180.
-Schwab, P., Linhardt, L., & Karlen, W. (2018). Perfect match: A simple method for learning representations for counterfactual inference with neural networks Retrieved from http://arxiv.org/abs/1810.00656.
-Shalit, U., Johansson, F. D., & Sontag, D. (2016). Estimating individual treatment effect: Generalization bounds and algorithms. Retrieved from ttp://arxiv.org/abs/1606.03976.
-Su, X., Tsai, C.-L., Wang, H., Nickerson, D. M., & Li, B. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10(Feb), 141–158.
-Taddy, M., Gardner, M., Chen, L., & Draper, D. (2016). A nonparametric Bayesian analysis of heterogeneous treatment effects in digital experimentation. Journal of Business & Economic Statistics, 34(4), 661–672.
-Tian, L., Alizadeh, A. A., Gentles, A. J., & Tibshirani, R. (2014). A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508), 1517–1532.
-Wallach JD, Sullivan PG, Trepanowski JF, Steyerberg EW, Ioannidis JP(2016). Sex based subgroup differences in randomized controlled trials: empirical evidence from Cochrane meta-analyses. BMJ2016;355:i5826. doi:10.1136/bmj.i5826 pmid:27884869.
-Wallach JD, Sullivan PG, Trepanowski JF, Sainani KL, Steyerberg EW, Ioannidis JP (2017). Evaluation of Evidence of statistical support and corroboration of subgroup claims in randomized clinical trials. JAMA Intern Med2017;177:554-60. doi:10.1001/jamainternmed.2016.9125 pmid:28192563.
-Kent, D.M., Steyerberg, E. and van Klaveren, D., (2018). Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. Bmj, 363, p.k4245.
The primary endpoint will be the change in HbA1c from baseline to week 52, with a non-inferiority margin of 0·3% for the comparison of each canagliflozin dose with glimepiride. If non-inferiority is shown, we will assess superiority on the basis of an upper bound of the 95% CI for the difference of each canagliflozin dose versus glimepiride of less than 0·0%. Analysis will be done in a modified intention-to-treat population, including all randomised patients who received at least one dose of study drug.
We will choose the doses of canagliflozin on the basis of previously published findings from a dose-ranging, canagliflozin 100 mg canagliflozin 300 mg as well as glimepiride treatment ranged from a starting dose of 1 mg to a maximum dose of 6 mg or 8 mg (on the basis of maximum approved dose in the country of the investigational site).
The specified secondary efficacy endpoints are percentage change from baseline in body-weight, and proportion of patients with documented hypoglycemic episodes, including biochemically documented episodes (concurrent finger-stick glucose or plasma glucose less than or equal 3.9 mmol/L with or without symptoms) and severe episodes (those needing assistance of another individual or resulting in seizure or loss of consciousness). Additional endpoints included the proportion of patients achieving HbA1c less than either 7·0% or 6·5%; change in fasting plasma glucose and systolic and diastolic blood pressure; and percentage change in fasting plasma lipids, including HDL cholesterol, triglycerides, LDL cholesterol, non-HDL cholesterol, and ratio of LDL cholesterol to HDL cholesterol.
To address RQ1 and RQ2 we will design a Monte Carlo simulation study, based on the observed correlations in the actual trial data to assess the relative performance of the statistical methods (Causal forests, BART and other methods identified). For each simulated patient, we will simulate potential outcomes under control (Y0) based on the control arm of the trial data, and potential outcomes under treatment (Y1) based on the treatment arm of the trial data. Hence the true HTE for each individual can be defined as (Y1-Y0) and is known by construction, allowing measures of estimation bias and precision to be calculated for each estimation method. We will consider a range of possible data generating model specifications for the potential outcomes (and hence for the HTE), ranging from no heterogeneity, heterogeneity due to a single covariate and complex patterns of effect modification. We will simulate scenarios (a) under a range of sample sizes (100, 500, 1000 & 5000 observations), (b) different trial designs (clustered/not clustered), (c) various numbers of potential treatment moderators/covariates and (d) alternative outcome types (binary, continuous, count, survival time).
Methods will be compared in terms of their percentage bias and RMSE for individual effects and aggregated subgroup effects as well as for the overall average treatment effect.
To address RQ3, we will apply each of the chosen algorithms (Causal Forest, BART and any other promising methods identified during the study) to the actual data-set to estimate heterogeneous treatment effects. The outcome will be Change in HbA1c From Baseline to Week 52, exclusion and exclusion criteria will be as described above. The covariates that will be included in the models include the following baseline variables: gender, age, race, HbA1c (%), FPG (mmol/L), Body-weight (kg), Body-mass index (kg/m2), Duration of type 2 diabetes (years), and Whether they entered an antihyperglycaemic drug adjustment period. Since the original study is likely to be underpowered to detect subgroup effects at this level, we consider the results to be hypothesis generating, rather than a means to conclusively identify subgroups that clinically benefit. A change of 0.5% (5.5 mmol/mol) will be considered clinically meaningful for this study.