Skip to main content

2020-4454

Research Proposal

Project Title: 
Inferential reproducibility of therapeutic research: a registered report for a cross-sectional study of RCTs available on major data-sharing platforms
Scientific Abstract: 

Background: RCTs are of major importance in providing information about health practices and policies. Ideally, the general public and scientists feel more confident when their methods and results can be reproduced
Objective: explore inferential reproducibility (i.e. Individual Patient Data is available and qualitatively similar conclusions can be drawn from a re-analysis of the original trials) for RCTs, for which there is available data on major data-sharing platforms
Study Design: cross-sectional
Participants: This study will include RCTs identified on 4 repositories (CSDR, VIVLI, Project data sphere and YODA). Eligible RCTs will be phase III studies in the field of therapeutics
Main Outcome: proportion of reproducible trials
Statistical Analysis: 62 of these studies will be randomly sampled, ensuring a precision of ± 7.5% to estimate our primary outcome, i.e. the proportion of studies where the conclusions are reproduced (we hypothesize that more than 90 % of studies could be reproduced). One researcher will then retrieve the IPD for these studies and other necessary documents for re-analysis by contacting the platforms and the study sponsors. For each study he will prepare a dossier containing the IPD, the protocol and information on the conduct of the study. A second researcher with no access to study reports (incl. publications) or analytical codes will use the dossier to run an independent reanalysis of each trial. Results from re-analyses will be reported in terms of the conclusions, p-values, effect sizes and changes from the initial protocol in each study.

Brief Project Background and Statement of Project Significance: 

Performing re-analyses that successfully reproduce results, methods or conclusions helps build confidence in scientific evidence. However, there are growing concerns about findings that cannot be reproduced . As a result, scientists have launched certain reproducibility initiatives. For example, in psychology, a collaborating team of several authors volunteered to re-run studies in order to obtain the same results with the methods used by the original researchers (result reproducibility). However the results were reproducible only for 39/100 experiments in this study. The same concerns have been expressed about biomedical research .

The existence and extent of reproducibility problems in the field of biomedical research is still unknown and few studies have attempted to answer the question. Results of a scoping review that we conducted to explore the impact of data sharing initiatives showed that studies whose purpose is data reuse are rarely intended for re-analysis . A recent investigation of data for 1622 new drugs submitted to China’s Food and Drug Administration (CFDA) for registration concluded that 1308 (81%) of the applications should be withdrawn because they contained fabricated, flawed, or inadequate data from the clinical trials. Alongside, most re-analyses have been conducted for selected studies such as very controversial studies (e.g. Study 329, a well-known study on paroxetine in adolescent depression presenting the drug as safe and effective, while the re-analysis demonstrated a lack of efficacy, along with some serious safety issues).
An empirical analysis suggests that only a small number of re-analyses of RCTs have been published to date; of these, only a minority were conducted by entirely independent authors, showing the limitations in the verification of findings and the lack of available data or metadata.

Furthermore, attempts to reproduce medical studies (pre-clinical or clinical) are often costly and difficult to perform. This is especially true for RCTs, although these studies are expensive and typically of major importance in providing information on health practices and policies. It is nonetheless possible, in a first approach, to explore whether qualitatively similar results and conclusions can be drawn from an independent reanalysis of RCT data (inferential reproducibility). This independent reanalysis is carried out in order to estimate the reproducibility of the inference (conclusions), which means that the objective is not necessarily to reproduce the same analytical methods (reproducibility of the methods) or the same numerical results (reproducibility of the results ), but to see whether, using only the original study protocol and data, it would be possible to find a clinically meaningful equivalence between the reanalysis and the original analysis.

Specific Aims of the Project: 

In the context of the growing interest in data-sharing in medicine, this project aims 1/ to perform systematic re-analyses of Randomised Controlled Trials (RCTs) in order to assess their reproducibility and 2/ to develop a tool to identify transparent and reproducible scientific practices.
We will critically appraise all initiatives for data-sharing in medicine and identify all platforms that enable RCT data to be shared. A random sample of 62 RCTs will be selected and re-analysed. The results of these re-analyses will be compared with the results of the original analyses. The same method will be applied for all new drugs submitted to the European Medicine Agency in order to replicate results and to provide practical information for drug regulation. The impact of certain contextual factors on reproducibility will be explored. A scale for scoring the sharing and reproducibility of useful data will be developed.

What is the purpose of the analysis being proposed? Please select all that apply.: 
Confirm or validate previously conducted research on treatment effectiveness
Software Used: 
RStudio
Data Source and Inclusion/Exclusion Criteria to be used to define the patient sample for your study: 

This cross-sectional study will include RCTs identified on the selected platforms, registered before randomization of the first participant in a WHO-approved registry such as clinicaltrials.gov, sharing their anonymized individual participant data (IPD). Eligible RCTs will 1/ be phase-III randomized studies (including phase II/III), 2/ include cluster, parallel trials and cross-over studies, 3/ include non-inferiority (and equivalence) designs and superiority designs, 4/ make no distinction in terms of patients, intervention, comparator or outcome, and 5/ be conducted in the field of therapeutics (i.e. with an objective of developing, evaluating or testing a drug, a medical device or equipment , a "talking therapy" or a combination therapy).
Furthermore, studies with no identified primary outcome will not be included but will be listed as non-evaluable studies. Qualitative studies will not be included.

Main Outcome Measure and how it will be categorized/defined for your study: 

As a primary outcome we report the proportion of trials in our cohort that were reproducible on the basis of the conditions and categories described in the methods. For non-replications, discrepancies that arise will be described quantitatively in terms of the differences in p-value and effect size.

Main Predictor/Independent Variable and how it will be categorized/defined for your study: 

All results of these analyses will be reported in terms of 1/ binary conclusion (positive or negative), 2/ p-value, 3/ effect size (and details about the outcome) 4/ changes compared to the initial protocol. These results will be compared with results of the analyses reported in the study reports, and, if available, the publication reporting the primary results of the completed trial. Because interpreting an RCT involves clinical expertise and cannot be reduced to solely quantitative factors, an in-depth discussion between two researchers not involved in the reanalysis (MS and FN), based on both quantitative and qualitative (clinical judgment) factors, will enable a decision on whether the changes in results described quantitatively could materialize into a change in conclusions. If these two reviewers judge that the conclusions are the same, the study will be considered as inferentially reproduced.

Other Variables of Interest that will be used in your analysis and how they will be categorized/defined for your study: 

As a secondary outcome, we will qualitatively describe key issues that arose in the course of data requests and reanalysis, including any application to investigators for further information or discussion on non-replication.

Statistical Analysis Plan: 

All results of these analyses will be reported in terms of 1/ binary conclusion (positive or negative), 2/ p-value, 3/ effect size (and details about the outcome) 4/ changes compared to the initial protocol. These results will be compared with results of the analyses reported in the study reports, and, if available, the publication reporting the primary results of the completed trial. Because interpreting an RCT involves clinical expertise and cannot be reduced to solely quantitative factors, an in-depth discussion between two researchers not involved in the reanalysis (MS and FN), based on both quantitative and qualitative (clinical judgment) factors, will enable a decision on whether the changes in results described quantitatively could materialize into a change in conclusions. If these two reviewers judge that the conclusions are the same, the study will be considered as inferentially reproduced.
If these two researchers judge that the conclusions are not the same, then the researcher in charge of the analysis (JG) will be given the statistical analysis plan of the study and will be asked to list the differences in terms of analysis. If the analytical code is provided by the datasharing platform, it will be also compared with the reanalysis code. If the researcher finds a discrepancy between the study data analysis plan and her own analysis plan, or between the study statistical code and her own statistical code, she will then correct this discrepancy in her analysis if justified (e.g. analysis population, use of covariates). An in-depth discussion between two researchers not involved in the re-analysis (MS and FN) will enable a decision to be made on whether the changes in results described quantitatively could materialize into a change in conclusions and whether the differences in terms of analytical plan are understandable and acceptable. If these two researchers judge that the conclusions are the same, the study will be considered as inferentially reproduced with verification.
If these two researchers judge that the conclusions are not the same or that the change in the analytical plan is neither justified nor desirable, then a senior statistician will perform his own re-analysis using the dossier initially prepared by the first researcher (researcher not involved in the reanalyses). Then a last in-depth discussion between two researchers not involved in the re-analysis (MS and FN) based on the senior statistician's re-analysis will enable a decision on whether the changes in results described quantitatively could materialize into a change in conclusions. If these two researchers judge that the conclusions are the same, the study will be considered as inferentially reproduced with verification, otherwise, the results will be considered as inferentially not reproduced. This process is described in figure 2.
The two reviewers involved in the reproducibility assessment will follow the steps shown in Figure 3 to assess the similarity of the conclusions: first, they will compare the statistical significance using the p-values. If they differ, the result will be considered as not reproducible.
If they do not differ, the reviewers will qualitatively compare effect sizes and their respective
95% CIs. In case of a difference of +/- 0.10 points in point estimates (expressed as standardized mean differences), the difference will be discussed with a clinician in order to assess its clinical significance.
Should the study be considered as not inferentially reproduced, the investigators/sponsors of the study will be contacted to discuss the discrepancy. This step will be performed after the evaluation of all the discrepancies between the re-analyses and the analyses reported in the study documents (protocol, SAP, statistical code, publication). Although the detection of errors is not part of the objectives of this study, errors can constitute a reason for the lack of inferential reproducibility.

Narrative Summary: 

This registered report introduces a cross-sectional study aiming to explore inferential reproducibility for RCTs, for which there is available data on major data-sharing platforms (CSDR, VIVLI, Project data sphere and YODA).
Eligible RCTs will be phase III studies conducted in the field of therapeutics. 62 of these studies will be randomly sampled. One researcher will then retrieve the IPD
for these studies and other necessary documents. A dossier containing the IPD, the protocol and information on the conduct of the study will be prepared. A second researcher who will have no access to study reports will use the dossier to run an independent re-analysis of each trial.

Project Timeline: 

We fixed a one-year time-frame in which study sponsors are requested to send us the IPD. One year appears as a reasonable time-frame to balance any administrative issue with the need for fast and effective data-sharing (for instance for IPD meta-analyses).

Project start: 01/01/2020
Analysis completion date: 01/11/2021
Manuscript drafted: 31/12/2021
Submission of full article: 01/02/2022
Results reported back to YODA: 01/02/2022

Dissemination Plan: 

A full text article will be written and published in the journal Royal Society Open Science. The registered report for this research project has already been accepted (in-principle) and will be published along with the final article.

Bibliography: 

see attached in-principle-accepted registered report in Royal Society Open Science

General Information

How did you learn about the YODA Project?: 
Scientific Publication

Request Clinical Trials

What type of data are you looking for?: 
Individual Participant-Level Data, which includes Full CSR and all supporting documentation

Data Request Status

Change the status of this request: 
Ongoing