array(40) {
  ["project_status"]=>
  string(7) "ongoing"
  ["project_assoc_trials"]=>
  array(1) {
    [0]=>
    object(WP_Post)#4083 (24) {
      ["ID"]=>
      int(8019)
      ["post_author"]=>
      string(4) "1363"
      ["post_date"]=>
      string(19) "2023-08-05 04:44:39"
      ["post_date_gmt"]=>
      string(19) "2023-08-05 04:44:39"
      ["post_content"]=>
      string(0) ""
      ["post_title"]=>
      string(163) "NCT03517722 - A Multicenter, Randomized, Double-blind, Placebo-controlled, Parallel-group Study of Ustekinumab in Subjects With Active Systemic Lupus Erythematosus"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(6) "closed"
      ["ping_status"]=>
      string(6) "closed"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(157) "nct03517722-a-multicenter-randomized-double-blind-placebo-controlled-parallel-group-study-of-ustekinumab-in-subjects-with-active-systemic-lupus-erythematosus"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2025-11-10 15:55:13"
      ["post_modified_gmt"]=>
      string(19) "2025-11-10 20:55:13"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(206) "https://dev-yoda.pantheonsite.io/clinical-trial/nct03517722-a-multicenter-randomized-double-blind-placebo-controlled-parallel-group-study-of-ustekinumab-in-subjects-with-active-systemic-lupus-erythematosus/"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(14) "clinical_trial"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "0"
      ["filter"]=>
      string(3) "raw"
    }
  }
  ["project_title"]=>
  string(133) "Multimodal Deep Learning Approaches for Predicting Response and Outcomes in Systemic Lupus Erythematosus: Application of Perceiver IO"
  ["project_narrative_summary"]=>
  string(823) "Systemic lupus erythematosus (lupus) is an autoimmune disease that causes unpredictable “flares” and can damage vital organs such as the kidneys or brain. Many patients also struggle with fatigue, pain, and reduced quality of life. At present, doctors cannot reliably predict when flares will occur, who is at risk of severe organ involvement, or which patients will be most affected. This project will use information from a previous lupus clinical trial. We will apply a modern artificial intelligence approach, called deep learning, that can combine different types of data including clinical features, blood tests, biomarkers, and—if available—genetic data. The model will aim to predict three key outcomes: disease flares, severe organ involvement, and patient-reported outcomes such as fatigue and wellbeing. "
  ["project_learn_source"]=>
  string(10) "web_search"
  ["principal_investigator"]=>
  array(7) {
    ["first_name"]=>
    string(6) "Edward"
    ["last_name"]=>
    string(5) "Vital"
    ["degree"]=>
    string(9) "MBBS, PhD"
    ["primary_affiliation"]=>
    string(19) "University of Leeds"
    ["email"]=>
    string(23) "E.M.J.Vital@leeds.ac.uk"
    ["state_or_province"]=>
    string(14) "West Yorkshire"
    ["country"]=>
    string(14) "United Kingdom"
  }
  ["project_key_personnel"]=>
  array(2) {
    [0]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(5) "Jack "
      ["p_pers_l_name"]=>
      string(6) "Arnold"
      ["p_pers_degree"]=>
      string(9) "MBBS, PhD"
      ["p_pers_pr_affil"]=>
      string(19) "University of Leeds"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(3) "yes"
    }
    [1]=>
    array(6) {
      ["p_pers_f_name"]=>
      string(10) "Lucy Marie"
      ["p_pers_l_name"]=>
      string(6) "Carter"
      ["p_pers_degree"]=>
      string(9) "MBBS, PhD"
      ["p_pers_pr_affil"]=>
      string(20) "Newcastle University"
      ["p_pers_scop_id"]=>
      string(0) ""
      ["requires_data_access"]=>
      string(3) "yes"
    }
  }
  ["project_ext_grants"]=>
  array(2) {
    ["value"]=>
    string(2) "no"
    ["label"]=>
    string(68) "No external grants or funds are being used to support this research."
  }
  ["project_date_type"]=>
  string(18) "full_crs_supp_docs"
  ["property_scientific_abstract"]=>
  string(1683) "Background
Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with variable manifestations and unpredictable course. Tools to predict flare, severe organ involvement, or patient-reported outcomes (PROs) remain limited. Conventional models struggle to integrate multimodal trial data. Transformer-based architectures such as Perceiver IO can compress heterogeneous inputs into a shared latent space and enable flexible, task-specific predictions.

Objective
To apply Perceiver IO to integrate multimodal data from a prior SLE trial and predict clinically relevant outcomes.

Study Design
Retrospective modelling study using anonymised participant-level trial data. Clinical, laboratory, biomarker, and genetic variables (where available) will be harmonised and used to train and validate the model.

Participants
All trial participants with available clinical and laboratory data will be included. Missingness will be handled with model-based approaches and imputation sensitivity analyses.

Primary and Secondary Outcomes
Primary: Disease flare (per parent trial definition, e.g. BILAG A/B or SLEDAI increase).
Secondary: Severe organ involvement (renal/CNS) and PROs (fatigue, pain, quality of life).

Statistical Analysis
Perceiver IO will generate latent representations from multimodal inputs. Query vectors will be trained for each outcome. Model performance will be assessed using AUROC, calibration plots, and Brier scores, with comparisons to conventional methods. Feature attribution (e.g. SHAP) will identify drivers of predictions." ["project_brief_bg"]=> string(1527) "Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with unpredictable flares, risk of severe organ involvement, and major impact on quality of life. Despite decades of research, clinicians still lack robust tools to predict which patients will flare, develop renal or CNS disease, or experience worsening fatigue and pain. Conventional statistical approaches have identified some risk factors, but they are limited by the disease’s complexity and the breadth of data now collected in modern clinical trials.

Recent advances in deep learning allow integration of multimodal datasets—including clinical features, laboratory values, biomarkers, and genetic data—into unified latent representations that capture cross-modal relationships. Perceiver IO, a transformer-based architecture, is well suited to this task, enabling compression of diverse inputs and query-based prediction of multiple outcomes from the same model.

In this project, we will apply Perceiver IO to data from the ustekinumab SLE trial to predict: (1) disease flare, (2) severe organ involvement, and (3) patient-reported outcomes. We will use Shapley additive explanations (SHAP) to identify the variables most strongly driving predictions, providing interpretable insights into disease mechanisms and prognosis. This work will advance scientific understanding of lupus heterogeneity, inform strategies for personalised care, and contribute generalizable methods relevant to other autoimmune diseases." ["project_specific_aims"]=> string(1503) "Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with unpredictable flares, risk of severe organ involvement, and substantial impact on quality of life. Current approaches cannot reliably predict which patients will flare, develop renal or neurological disease, or experience the greatest burden of fatigue and pain. The overall objective of this project is to apply advanced deep learning to individual-level data from the ustekinumab SLE trial to create predictive models of disease course and identify the factors driving prognosis. We will implement a Perceiver IO architecture, a transformer-derived framework designed to handle multimodal data, to integrate demographics, clinical manifestations, laboratory results, biomarkers, and where available, genetic information into a unified latent representation. From this representation, we will develop models that predict key outcomes including disease flare, severe end-organ involvement, and patient-reported outcomes such as fatigue and quality of life. Model performance will be evaluated using cross-validation and compared with conventional analytic methods. To ensure interpretability, we will apply Shapley additive explanations (SHAP) to quantify the relative contribution of each variable to predictions. This work will establish the feasibility of multimodal deep learning in SLE, generate new insights into disease heterogeneity, and provide a foundation for predictive tools to support personalised patient care." ["project_study_design"]=> array(2) { ["value"]=> string(14) "indiv_trial_an" ["label"]=> string(25) "Individual trial analysis" } ["project_purposes"]=> array(1) { [0]=> array(2) { ["value"]=> string(50) "research_on_clinical_prediction_or_risk_prediction" ["label"]=> string(50) "Research on clinical prediction or risk prediction" } } ["project_research_methods"]=> string(1043) "The data source for this project will be the individual participant-level dataset from the ustekinumab trial in systemic lupus erythematosus (SLE), accessed through the YODA Project. This trial enrolled adult patients with active, autoantibody-positive SLE despite background therapy, and includes demographics, clinical features, laboratory measures, biomarkers, and patient-reported outcomes.

All enrolled participants will be included in the analysis. No additional exclusion criteria will be applied beyond those used in the original trial protocol. Patients with partially missing data will be retained, with missingness handled through internal model strategies or imputation. This inclusive approach is essential for developing generalizable predictive models that reflect the full heterogeneity of lupus.

No other datasets outside of the YODA Project will be used, and no pooling or aggregation of data across sources is planned. All analyses will be performed within the secure YODA research environment." ["project_main_outcome_measure"]=> string(1529) "The primary outcomes for this project are threefold. First, disease flare, which will be defined using increases in validated disease activity measures available within the dataset such as BILAG or SLEDAI, in line with the definitions used in the trial. Second, severe end-organ involvement, captured through the presence or new occurrence of renal, central nervous system, or other major organ manifestations as adjudicated in the trial dataset. Third, patient-reported outcomes, focusing on fatigue, pain, and health-related quality of life, measured using instruments such as FACIT-Fatigue or SF-36 where available. These outcomes reflect the domains of most clinical and patient importance and align directly with the objectives of the modelling framework.

Secondary outcomes will include attainment of low disease activity or remission states where derivable, withdrawal from the study or use of rescue therapy as markers of inadequate disease control, and changes in key laboratory measures including proteinuria, complement levels, and anti-dsDNA titres. Outcomes will be treated as binary, ordinal, or continuous variables as appropriate to preserve information while enabling prediction tasks.

No changes will be made to the outcome definitions as prespecified in the original trial. The analysis will use the outcomes as collected and categorised, ensuring consistency with the trial dataset while allowing the deep learning model to identify latent patterns predictive of these endpoints." ["project_main_predictor_indep"]=> string(1051) "This project does not rely on a single predictor but on an integrated multimodal representation of each patient. Clinical features (baseline disease activity, organ involvement, corticosteroid use, concomitant therapy), laboratory measures (complement levels, anti-dsDNA, renal function, haematology), biomarkers, and where available genetic data will all be included. These variables will be harmonised and scaled, then compressed into a unified latent space using the Perceiver IO architecture. Each input will be retained in its native form (continuous laboratory values, categorical clinical variables, binary indicators), with missing data addressed through internal model handling or imputation. The resulting latent embedding will function as the independent variable tested against the study’s primary outcomes of disease flare, severe organ involvement, and patient-reported outcomes, as well as secondary endpoints. This approach allows the model to capture complex interactions across modalities rather than relying on any single feature." ["project_other_variables_interest"]=> string(8) "As above" ["project_stat_analysis_plan"]=> string(1717) "We will begin with descriptive analyses to characterise the trial population. Continuous variables such as age and laboratory values will be summarised with means, medians, and standard deviations, while categorical variables such as sex, ethnicity, and organ involvement will be presented as counts and percentages. Baseline disease activity and patient-reported outcomes will be described overall and across key subgroups.

Bivariate analyses will assess associations between predictors and outcomes using t-tests, chi-square tests, and correlations as appropriate. For time-to-event outcomes such as flare, Kaplan–Meier methods and log-rank tests will be applied. Conventional multivariable models, including logistic regression and Cox regression, will serve as benchmarks and provide effect estimates for comparison.

The primary analytic approach will use the Perceiver IO deep learning architecture to integrate multimodal inputs (clinical, laboratory, biomarker, and genetic data) into a shared latent space. Query-based outputs will be trained to predict flare, severe organ involvement, and patient-reported outcomes, with secondary outcomes including low disease activity, withdrawal, and rescue therapy. Five-fold cross-validation will be used, with AUROC, precision–recall, calibration, and accuracy metrics.

To ensure interpretability, Shapley additive explanations (SHAP) will quantify the relative contribution of each variable to predictions. Missing data will be addressed through imputation or model-internal handling. Results will be reported as aggregate summaries, model performance, and feature attribution, consistent with YODA data use policies." ["project_software_used"]=> array(4) { [0]=> array(2) { ["value"]=> string(6) "python" ["label"]=> string(6) "Python" } [1]=> array(2) { ["value"]=> string(1) "r" ["label"]=> string(1) "R" } [2]=> array(2) { ["value"]=> string(7) "rstudio" ["label"]=> string(7) "RStudio" } [3]=> array(2) { ["value"]=> string(11) "open_office" ["label"]=> string(11) "Open Office" } } ["project_timeline"]=> string(1242) "The project is expected to begin within one month of data access being granted. During the first two months, we will complete data familiarisation, harmonisation of variables, and descriptive analyses to characterise the cohort. Months three to six will focus on development of the modelling framework, including implementation of the Perceiver IO architecture, cross-validation, and baseline benchmarking against conventional regression and random forest models. Months seven to nine will involve detailed interpretability analyses, including application of Shapley additive explanations (SHAP) to identify key drivers of outcomes. By the end of month nine, the core analytic work will be complete.

From months ten to eleven, results will be consolidated, visualisations and tables prepared, and the first manuscript drafted. Submission to a peer-reviewed journal is anticipated by the end of month eleven. Month twelve will allow for revisions, preparation of abstracts for conference submission, and reporting of results back to the YODA Project. If required, we will request a short extension to accommodate peer review timelines, but the analytic plan is designed to be completed within the initial 12-month access period." ["project_dissemination_plan"]=> string(873) "The main output will be a peer-reviewed manuscript describing the deep learning framework, model performance, and insights into predictors of flare, organ involvement, and patient-reported outcomes. Likely target journals include Annals of the Rheumatic Diseases, Arthritis & Rheumatology, or Lupus Science & Medicine. A methods-focused manuscript may also be submitted to a digital medicine journal to highlight the application of Perceiver IO in autoimmune disease. Findings will be presented at major rheumatology meetings (EULAR, ACR) and, where relevant, at computational medicine conferences. Patient-focused results, especially those concerning fatigue and quality of life, will be shared in plain language with lupus foundations and advocacy groups. All dissemination will comply with YODA policies, reporting only aggregated results and model explanations." ["project_bibliography"]=> string(0) "" ["project_suppl_material"]=> bool(false) ["project_coi"]=> array(3) { [0]=> array(1) { ["file_coi"]=> array(21) { ["ID"]=> int(17964) ["id"]=> int(17964) ["title"]=> string(11) "COI FORM JA" ["filename"]=> string(15) "COI-FORM-JA.pdf" ["filesize"]=> int(20330) ["url"]=> string(64) "https://yoda.yale.edu/wp-content/uploads/2025/08/COI-FORM-JA.pdf" ["link"]=> string(57) "https://yoda.yale.edu/data-request/2025-0616/coi-form-ja/" ["alt"]=> string(0) "" ["author"]=> string(2) "20" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(11) "coi-form-ja" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-09-25 13:24:32" ["modified"]=> string(19) "2025-09-25 13:24:32" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } } [1]=> array(1) { ["file_coi"]=> array(21) { ["ID"]=> int(17925) ["id"]=> int(17925) ["title"]=> string(14) "COI-carter.pdf" ["filename"]=> string(14) "COI-carter.pdf" ["filesize"]=> int(20830) ["url"]=> string(63) "https://yoda.yale.edu/wp-content/uploads/2025/09/COI-carter.pdf" ["link"]=> string(60) "https://yoda.yale.edu/data-request/2025-0616/coi-carter-pdf/" ["alt"]=> string(0) "" ["author"]=> string(4) "2193" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(14) "coi-carter-pdf" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-09-17 13:44:02" ["modified"]=> string(19) "2025-09-17 13:44:05" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } } [2]=> array(1) { ["file_coi"]=> array(21) { ["ID"]=> int(17926) ["id"]=> int(17926) ["title"]=> string(13) "COI-Vital.pdf" ["filename"]=> string(13) "COI-Vital.pdf" ["filesize"]=> int(32667) ["url"]=> string(62) "https://yoda.yale.edu/wp-content/uploads/2025/09/COI-Vital.pdf" ["link"]=> string(59) "https://yoda.yale.edu/data-request/2025-0616/coi-vital-pdf/" ["alt"]=> string(0) "" ["author"]=> string(4) "2193" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(13) "coi-vital-pdf" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-09-17 13:44:04" ["modified"]=> string(19) "2025-09-17 13:44:05" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } } } ["data_use_agreement_training"]=> bool(true) ["human_research_protection_training"]=> bool(true) ["certification"]=> bool(true) ["search_order"]=> string(1) "0" ["project_send_email_updates"]=> bool(false) ["project_publ_available"]=> bool(true) ["project_year_access"]=> string(4) "2025" ["project_rep_publ"]=> bool(false) ["project_assoc_data"]=> array(0) { } ["project_due_dil_assessment"]=> array(21) { ["ID"]=> int(18345) ["id"]=> int(18345) ["title"]=> string(47) "YODA Project Due Diligence Assessment 2025-0616" ["filename"]=> string(51) "YODA-Project-Due-Diligence-Assessment-2025-0616.pdf" ["filesize"]=> int(105741) ["url"]=> string(100) "https://yoda.yale.edu/wp-content/uploads/2025/08/YODA-Project-Due-Diligence-Assessment-2025-0616.pdf" ["link"]=> string(93) "https://yoda.yale.edu/data-request/2025-0616/yoda-project-due-diligence-assessment-2025-0616/" ["alt"]=> string(0) "" ["author"]=> string(4) "1885" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(47) "yoda-project-due-diligence-assessment-2025-0616" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-11-19 16:20:38" ["modified"]=> string(19) "2025-11-19 16:20:38" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } ["project_title_link"]=> array(21) { ["ID"]=> int(18346) ["id"]=> int(18346) ["title"]=> string(46) "YODA Project Protocol - 2025-0616 - 2025-09-25" ["filename"]=> string(46) "YODA-Project-Protocol-2025-0616-2025-09-25.pdf" ["filesize"]=> int(125004) ["url"]=> string(95) "https://yoda.yale.edu/wp-content/uploads/2025/08/YODA-Project-Protocol-2025-0616-2025-09-25.pdf" ["link"]=> string(88) "https://yoda.yale.edu/data-request/2025-0616/yoda-project-protocol-2025-0616-2025-09-25/" ["alt"]=> string(0) "" ["author"]=> string(4) "1885" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(42) "yoda-project-protocol-2025-0616-2025-09-25" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-11-19 16:20:58" ["modified"]=> string(19) "2025-11-19 16:20:58" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } ["project_review_link"]=> array(21) { ["ID"]=> int(18347) ["id"]=> int(18347) ["title"]=> string(36) "YODA Project Review - 2025-0616_site" ["filename"]=> string(38) "YODA-Project-Review-2025-0616_site.pdf" ["filesize"]=> int(1331867) ["url"]=> string(87) "https://yoda.yale.edu/wp-content/uploads/2025/08/YODA-Project-Review-2025-0616_site.pdf" ["link"]=> string(80) "https://yoda.yale.edu/data-request/2025-0616/yoda-project-review-2025-0616_site/" ["alt"]=> string(0) "" ["author"]=> string(4) "1885" ["description"]=> string(0) "" ["caption"]=> string(0) "" ["name"]=> string(34) "yoda-project-review-2025-0616_site" ["status"]=> string(7) "inherit" ["uploaded_to"]=> int(17840) ["date"]=> string(19) "2025-11-19 16:21:09" ["modified"]=> string(19) "2025-11-19 16:21:09" ["menu_order"]=> int(0) ["mime_type"]=> string(15) "application/pdf" ["type"]=> string(11) "application" ["subtype"]=> string(3) "pdf" ["icon"]=> string(62) "https://yoda.yale.edu/wp/wp-includes/images/media/document.png" } ["project_highlight_button"]=> string(0) "" ["request_overridden_res"]=> string(1) "3" ["request_data_partner"]=> string(15) "johnson-johnson" } data partner
array(1) { [0]=> string(15) "johnson-johnson" }

pi country
array(0) { }

pi affil
array(0) { }

products
array(1) { [0]=> string(7) "stelara" }

num of trials
array(1) { [0]=> string(1) "1" }

res
array(1) { [0]=> string(1) "3" }

2025-0616

General Information

How did you learn about the YODA Project?: Internet Search

Conflict of Interest

Request Clinical Trials

Associated Trial(s):
  1. NCT03517722 - A Multicenter, Randomized, Double-blind, Placebo-controlled, Parallel-group Study of Ustekinumab in Subjects With Active Systemic Lupus Erythematosus
What type of data are you looking for?: Individual Participant-Level Data, which includes Full CSR and all supporting documentation

Request Clinical Trials

Data Request Status

Status: Ongoing

Research Proposal

Project Title: Multimodal Deep Learning Approaches for Predicting Response and Outcomes in Systemic Lupus Erythematosus: Application of Perceiver IO

Scientific Abstract: Background
Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with variable manifestations and unpredictable course. Tools to predict flare, severe organ involvement, or patient-reported outcomes (PROs) remain limited. Conventional models struggle to integrate multimodal trial data. Transformer-based architectures such as Perceiver IO can compress heterogeneous inputs into a shared latent space and enable flexible, task-specific predictions.

Objective
To apply Perceiver IO to integrate multimodal data from a prior SLE trial and predict clinically relevant outcomes.

Study Design
Retrospective modelling study using anonymised participant-level trial data. Clinical, laboratory, biomarker, and genetic variables (where available) will be harmonised and used to train and validate the model.

Participants
All trial participants with available clinical and laboratory data will be included. Missingness will be handled with model-based approaches and imputation sensitivity analyses.

Primary and Secondary Outcomes
Primary: Disease flare (per parent trial definition, e.g. BILAG A/B or SLEDAI increase).
Secondary: Severe organ involvement (renal/CNS) and PROs (fatigue, pain, quality of life).

Statistical Analysis
Perceiver IO will generate latent representations from multimodal inputs. Query vectors will be trained for each outcome. Model performance will be assessed using AUROC, calibration plots, and Brier scores, with comparisons to conventional methods. Feature attribution (e.g. SHAP) will identify drivers of predictions.

Brief Project Background and Statement of Project Significance: Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with unpredictable flares, risk of severe organ involvement, and major impact on quality of life. Despite decades of research, clinicians still lack robust tools to predict which patients will flare, develop renal or CNS disease, or experience worsening fatigue and pain. Conventional statistical approaches have identified some risk factors, but they are limited by the disease's complexity and the breadth of data now collected in modern clinical trials.

Recent advances in deep learning allow integration of multimodal datasets--including clinical features, laboratory values, biomarkers, and genetic data--into unified latent representations that capture cross-modal relationships. Perceiver IO, a transformer-based architecture, is well suited to this task, enabling compression of diverse inputs and query-based prediction of multiple outcomes from the same model.

In this project, we will apply Perceiver IO to data from the ustekinumab SLE trial to predict: (1) disease flare, (2) severe organ involvement, and (3) patient-reported outcomes. We will use Shapley additive explanations (SHAP) to identify the variables most strongly driving predictions, providing interpretable insights into disease mechanisms and prognosis. This work will advance scientific understanding of lupus heterogeneity, inform strategies for personalised care, and contribute generalizable methods relevant to other autoimmune diseases.

Specific Aims of the Project: Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with unpredictable flares, risk of severe organ involvement, and substantial impact on quality of life. Current approaches cannot reliably predict which patients will flare, develop renal or neurological disease, or experience the greatest burden of fatigue and pain. The overall objective of this project is to apply advanced deep learning to individual-level data from the ustekinumab SLE trial to create predictive models of disease course and identify the factors driving prognosis. We will implement a Perceiver IO architecture, a transformer-derived framework designed to handle multimodal data, to integrate demographics, clinical manifestations, laboratory results, biomarkers, and where available, genetic information into a unified latent representation. From this representation, we will develop models that predict key outcomes including disease flare, severe end-organ involvement, and patient-reported outcomes such as fatigue and quality of life. Model performance will be evaluated using cross-validation and compared with conventional analytic methods. To ensure interpretability, we will apply Shapley additive explanations (SHAP) to quantify the relative contribution of each variable to predictions. This work will establish the feasibility of multimodal deep learning in SLE, generate new insights into disease heterogeneity, and provide a foundation for predictive tools to support personalised patient care.

Study Design: Individual trial analysis

What is the purpose of the analysis being proposed? Please select all that apply.: Research on clinical prediction or risk prediction

Software Used: Python, R, RStudio, Open Office

Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: The data source for this project will be the individual participant-level dataset from the ustekinumab trial in systemic lupus erythematosus (SLE), accessed through the YODA Project. This trial enrolled adult patients with active, autoantibody-positive SLE despite background therapy, and includes demographics, clinical features, laboratory measures, biomarkers, and patient-reported outcomes.

All enrolled participants will be included in the analysis. No additional exclusion criteria will be applied beyond those used in the original trial protocol. Patients with partially missing data will be retained, with missingness handled through internal model strategies or imputation. This inclusive approach is essential for developing generalizable predictive models that reflect the full heterogeneity of lupus.

No other datasets outside of the YODA Project will be used, and no pooling or aggregation of data across sources is planned. All analyses will be performed within the secure YODA research environment.

Primary and Secondary Outcome Measure(s) and how they will be categorized/defined for your study: The primary outcomes for this project are threefold. First, disease flare, which will be defined using increases in validated disease activity measures available within the dataset such as BILAG or SLEDAI, in line with the definitions used in the trial. Second, severe end-organ involvement, captured through the presence or new occurrence of renal, central nervous system, or other major organ manifestations as adjudicated in the trial dataset. Third, patient-reported outcomes, focusing on fatigue, pain, and health-related quality of life, measured using instruments such as FACIT-Fatigue or SF-36 where available. These outcomes reflect the domains of most clinical and patient importance and align directly with the objectives of the modelling framework.

Secondary outcomes will include attainment of low disease activity or remission states where derivable, withdrawal from the study or use of rescue therapy as markers of inadequate disease control, and changes in key laboratory measures including proteinuria, complement levels, and anti-dsDNA titres. Outcomes will be treated as binary, ordinal, or continuous variables as appropriate to preserve information while enabling prediction tasks.

No changes will be made to the outcome definitions as prespecified in the original trial. The analysis will use the outcomes as collected and categorised, ensuring consistency with the trial dataset while allowing the deep learning model to identify latent patterns predictive of these endpoints.

Main Predictor/Independent Variable and how it will be categorized/defined for your study: This project does not rely on a single predictor but on an integrated multimodal representation of each patient. Clinical features (baseline disease activity, organ involvement, corticosteroid use, concomitant therapy), laboratory measures (complement levels, anti-dsDNA, renal function, haematology), biomarkers, and where available genetic data will all be included. These variables will be harmonised and scaled, then compressed into a unified latent space using the Perceiver IO architecture. Each input will be retained in its native form (continuous laboratory values, categorical clinical variables, binary indicators), with missing data addressed through internal model handling or imputation. The resulting latent embedding will function as the independent variable tested against the study's primary outcomes of disease flare, severe organ involvement, and patient-reported outcomes, as well as secondary endpoints. This approach allows the model to capture complex interactions across modalities rather than relying on any single feature.

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: As above

Statistical Analysis Plan: We will begin with descriptive analyses to characterise the trial population. Continuous variables such as age and laboratory values will be summarised with means, medians, and standard deviations, while categorical variables such as sex, ethnicity, and organ involvement will be presented as counts and percentages. Baseline disease activity and patient-reported outcomes will be described overall and across key subgroups.

Bivariate analyses will assess associations between predictors and outcomes using t-tests, chi-square tests, and correlations as appropriate. For time-to-event outcomes such as flare, Kaplan--Meier methods and log-rank tests will be applied. Conventional multivariable models, including logistic regression and Cox regression, will serve as benchmarks and provide effect estimates for comparison.

The primary analytic approach will use the Perceiver IO deep learning architecture to integrate multimodal inputs (clinical, laboratory, biomarker, and genetic data) into a shared latent space. Query-based outputs will be trained to predict flare, severe organ involvement, and patient-reported outcomes, with secondary outcomes including low disease activity, withdrawal, and rescue therapy. Five-fold cross-validation will be used, with AUROC, precision--recall, calibration, and accuracy metrics.

To ensure interpretability, Shapley additive explanations (SHAP) will quantify the relative contribution of each variable to predictions. Missing data will be addressed through imputation or model-internal handling. Results will be reported as aggregate summaries, model performance, and feature attribution, consistent with YODA data use policies.

Narrative Summary: Systemic lupus erythematosus (lupus) is an autoimmune disease that causes unpredictable "flares" and can damage vital organs such as the kidneys or brain. Many patients also struggle with fatigue, pain, and reduced quality of life. At present, doctors cannot reliably predict when flares will occur, who is at risk of severe organ involvement, or which patients will be most affected. This project will use information from a previous lupus clinical trial. We will apply a modern artificial intelligence approach, called deep learning, that can combine different types of data including clinical features, blood tests, biomarkers, and--if available--genetic data. The model will aim to predict three key outcomes: disease flares, severe organ involvement, and patient-reported outcomes such as fatigue and wellbeing.

Project Timeline: The project is expected to begin within one month of data access being granted. During the first two months, we will complete data familiarisation, harmonisation of variables, and descriptive analyses to characterise the cohort. Months three to six will focus on development of the modelling framework, including implementation of the Perceiver IO architecture, cross-validation, and baseline benchmarking against conventional regression and random forest models. Months seven to nine will involve detailed interpretability analyses, including application of Shapley additive explanations (SHAP) to identify key drivers of outcomes. By the end of month nine, the core analytic work will be complete.

From months ten to eleven, results will be consolidated, visualisations and tables prepared, and the first manuscript drafted. Submission to a peer-reviewed journal is anticipated by the end of month eleven. Month twelve will allow for revisions, preparation of abstracts for conference submission, and reporting of results back to the YODA Project. If required, we will request a short extension to accommodate peer review timelines, but the analytic plan is designed to be completed within the initial 12-month access period.

Dissemination Plan: The main output will be a peer-reviewed manuscript describing the deep learning framework, model performance, and insights into predictors of flare, organ involvement, and patient-reported outcomes. Likely target journals include Annals of the Rheumatic Diseases, Arthritis & Rheumatology, or Lupus Science & Medicine. A methods-focused manuscript may also be submitted to a digital medicine journal to highlight the application of Perceiver IO in autoimmune disease. Findings will be presented at major rheumatology meetings (EULAR, ACR) and, where relevant, at computational medicine conferences. Patient-focused results, especially those concerning fatigue and quality of life, will be shared in plain language with lupus foundations and advocacy groups. All dissemination will comply with YODA policies, reporting only aggregated results and model explanations.

Bibliography: