To generate multivariate statistical models based on clinical data to predict treatment response to paliperidone in patients with schizophrenia on the level of individual subjects.
Only a subset of patients with schizophrenia repond sufficiently to the currently available pharmacological interventions. There exists no clinical marker to predict treatment response in individual patients. This inefficacy leads to prolonged illness episodes and suffering in affected patients, repeated hospitalizations and a substantial socio-economic burden.
To use clinical data from existing clinical trials to generate multivariate models for the prediction of treatment response to paliperidone in patients with schizophrenia.
Clinical and demographic variables of individual patients will be used to generate prediction models. We will employ a cross-validation scheme (leave-one-trial-out cross-validation) to estimate the generalizability of the generated models.
All participants of the requested trials will be included in the analysis.
Main Outcome Measure(s):
The main outcome measure of our analysis will be the balanced accuracy ([sensitivity + specificity]/2) of the generated multivariate statistical models when predicting early responders vs. early non-responders (week 6-7) and late responders vs. late non-responders (week 12-13). Response to paliperidone will be defined as a reduction on the PANSS total of >20% relative to baseline.
All continuous variables will be mean centered and scaled. A wrapper method will be used to generate models based on a minimal set of highly predictive variables.
The clinical response to pharmacological interventions in patients with schizophrenia is highly heterogenous. Despite significant scientific efforts, there exists currently no reliable biomarkers to predict treatment response. As a consequence, patients frequently undergo multiple unsuccessful pharmacological interventions until they respond sufficiently. This leads to prolonged illness episodes and suffering in affected patients, repeated hospitalizations and a significant socio-economic burden. In the absence of more effective treatment options, it has been suggested that a stratification prior to treatment initiation could be a valuable strategy to improve treatment outcome in psychiatric patients (Chekroud & Krystal 2016, BMJ). In their recent work Chekroud et al. (2016, Lancet Psychiatry) demonstrated that merely based on clinical and demographic data, a multivariate statistical model could be used to predict treatment response in patients with major depression. In this study, we are planning to apply this approach to existing data from pharmacological trials in patients with schizophrenia. Multivariate pattern analysis (MVPA) will be employed to generate statistical models to predict treatment response to paliperidone. The analysis will identify a minimal subset of highly predictive variables. Those could be used to create a MVPA-based clinical questionnaire to be used in the clinical setting to assess patients and acquire the necessary data for response prediction. In this way, our work could significantly contribute to the efficacy of existing pharmacological treatments in psychiatry. If applied in the clinical setting, patients could be assessed prior to treatment initiation and their treatment response could be predicted using the generated models. Treatment with paliperidone could then be restricted to only those patients with sufficient response while other patients could be treated with other compounds.
Multivariate pattern analysis (MVPA) is a new statistical method that allows the identification of predictive patterns in high-dimensional data sets. In comparison with traditional univariate statistical approaches, MVPA is particularly suited to be applied to a large number of variables. The approach allows prediction with high accuracy on the level of individual subjects. In this way, models can be applied to answer practical questions in the clinical context such as the prediction of the disease course or potential differential diagnosis of an individual patient. Thus, the aim of the present project is to apply MVPA to identify patterns in clinical data to predict treatment response to paliperidone in patients withs schizophrenia.
In addition, we will apply trajectory-based statistical methods to these data. Trajectory-based models (e.g. latent class models, and growth mixture models) capture heterogeneity in the development of clinical outcomes during an intervention, and this more sensitive approach can result in trial outcomes that differ from traditional endpoint measures.
The response to existing pharmacological interventions in patients with schizophrenia is highly heterogenous. Overall, it is estimated that between 20 and 30 % of patients do not respond sufficiently to medication. Currently, clinicians have no way of predicting whether a specific medication will work for a specific patient. Consequently, patients frequently undergo multiple unsuccessful treatments. This is associated with a prolonged duration of illness episodes, repeated hospitalizations and increased financial and medical costs. We plan to develop tools that can help clinicians choose better treatments, and ultimately help patients get better faster.
We plan to initiate the project at the 1st of October 2016. At the initial stage of the project, data will be merged and processed to prepare analysis. At the 1st of November we will begin the analysis of the prepared data and expect results at the 1st of January 2017. In the following period the draft will be a written and prepared for publication. We expect the first submission of the manuscript at the 1st of March 2017.
We plan to submit the finalized manuscript to one of the leading, peer-reviewed journals in the field of schizophrenia and psychiatry (e.g. Molecular Psychiatry, Biological Psychiatry, American Journal of Psychiatry, Schizophrenia Bulletin, JAMA Psychiatry, Lancet Psychiatry). Furthermore, we plan to create a clinical questionnaire to specifically provide the required information to predict treatment response based on our multivariate models, and make this available to clinicians online. This will allow this research to reach its maximum potential for improving patient care.
The main outcome measure of our analysis will be the balanced accuracy ([sensitivity + specificity]/2) of the generated multivariate statistical models when predicting early responders vs. non-responders (week 6-7) and late responders vs. non-responders (week 12-13) (http://www.ncbi.nlm.nih.gov/pubmed/14662555). Response to paliperidone will be defined as a reduction on the PANSS total of >20% relative to baseline. For trajectory based approaches, we will use trajectories of PANSS-based outcome measurements over time.
In the current project, we plan to employ a data-driven approach. All variables shared between the available clinical trials will be entered in our analysis. Using cross-validation and penalized statistical methods we will aim to avoid overfitting and generate reliable estimated of the generalizability of the models predictions.
We will test the specificity of our results to paliperidone by applying the generated models to predict early and late treatment response (reduction in PANSS total score >20% relative to baseline) in patients treated with quetiapine, olanzapine, aripiprazole and placebo.
We will employ a multivariate approach to generate statistical models to predict treatment response to paliperidone. The main outcome measure will be the binary variables early treatment response (week 6-7) and late treatment response (week 12-13) whereas treatment response will be defined as a reduction in the PANSS total score of >20% with respect to baseline. From all trials only participants with available data of at least one of the main outcome measures will be included. Based on those participants we will define maximum set of clinical and demographic variables that is shared between all trails. Prior to analysis all continous variables will be mean centered and scaled. In order to improve predictive power of the modes, variables with low predictive value as well as highly correlated variables will be removed from the analysis to retain a subset of n=25 highly predictive variables (Chekroud et al., 2016). This will be accomplished by a elastic net regression which includes a linear combination of a L1- as well as a L2-regularization term. Subsequently, we will fit a gradient-boosting machine to classify patients into responders and non-responders. Most importantly, all data processing steps will be embedded in a nested cross-validation scheme. At the outer level of the cross-validation we will successively hold out one entire trial and use the remaining data to generate the predictions models. Then the models will be applied to the hold-out data to estimate the generalizability of the model. This procedure will be repeated until each trial has been held out once. At the inner cross-validation loop, we will employ a 10-fold cross-validation with 10 permutations to optimize the multivariate prediction models. The main outcome measure of the prediction models will be the balance accuracy.
In order to test the specificity of our results to paliperidone, the models will be applied also to participants receiving other drugs (risperidone, quetiapine, olanzapine, aripiprazole) or placebo. In this way, we will test whether the identified predictive patterns relate to a specific paliperidone response, a general trait to respond to any medication or a general subtype of patients with a spontaneous remission.
For trajectory-based approaches we will explore whether similar or different trajectory classes exist for patients who take active treatments or placebo, and teste whether there were clinical predictors of trajectory class membership.
Sample Size Addendum:
Our thoughts on sample size are driven primarily by data availability. As you know, machine learning approaches depend critically on learning from large samples. Traditional power analyses do not apply in the same way as they would say, for a clinical trial, because algorithms will almost always benefit from additional subjects. In addition, until we have access to the data and can ascertain the level of missingness in baseline data, we would not be able to know exactly how many patients we would be able to use across all of the studies. We will us multivariate approaches with dimensionality reduction built in (e.g. decision trees, or "penalized" approaches such as lasso or ridge regression), which greatly minimizes concerns about our ability to include a large number of predictor variables in our models.
Finally, to address the issue of whether the lack of efficacy is due to non-adherence to study medications, we can conduct sensitivity analyses by restricting our analyses to a sample of patients that self-reported adhering to medications. Despite these provisions, it is difficult to differentiate amongst reasons for efficacy (for example, whether the individual patient would have got better anyway, whether the patient had a placebo response, or it was due to the active compound). Our primary goal is to develop easy to use tools that predict treatment outcome for a given intervention, rather than to try and specifically predict the reason for that outcome