Fragility Index for Time-to-Event Endpoints in Single-Arm Clinical Trials
Abstract.
The reliability of clinical trial outcomes is crucial, especially in guiding medical decisions. In this paper, we introduce the Fragility Index (FI) for time-to-event endpoints in single-arm clinical trials—a novel metric designed to quantify the robustness of study conclusions. The FI represents the smallest number of censored observations that, when reclassified as uncensored events, causes the posterior probability of the median survival time exceeding a specified threshold to fall below a predefined confidence level. While drug effectiveness is typically assessed by determining whether the posterior probability exceeds a specified confidence level, the FI offers a complementary measure, indicating how robust these conclusions are to potential shifts in the data. Using a Bayesian approach, we develop a practical framework for computing the FI based on the exponential survival model. To facilitate the application of our method, we developed an R package fi, which provides a tool to compute the Fragility Index. Through real world case studies involving time to event data from single arms clinical trials, we demonstrate the utility of this index. Our findings highlight how the FI can be a valuable tool for assessing the robustness of survival analyses in single-arm studies, aiding researchers and clinicians in making more informed decisions.
Keywords: Bayesian analysis, Exponential survival model, Fragility Index, Single-arm clinical trials, Time-to-event data
1. Introduction
The Fragility Index (FI) is a widely used metric in randomized controlled trials (RCTs) to assess the robustness of statistically significant findings, particularly in studies with small sample sizes or limited outcome events. Traditional significance testing, which relies on p-values, can sometimes create a false sense of confidence, as minor changes in the data may shift results from significant to non-significant, casting doubt on the reliability of conclusions (Editorial, 2019). For binary outcomes, the FI, introduced by Walsh et al. (2014), quantifies the vulnerability of a study’s results by determining the minimum number of events (e.g., successes or failures) that would need to change to reverse statistical significance. This is calculated by systematically altering the outcomes—changing non-events to events or vice versa—until the p-value exceeds the significance threshold, typically . For example, in a trial testing a new treatment, if statistical significance is observed, a low FI, meaning that just one or two more events would change the result, indicates that the findings are fragile and may lack robustness. Although there is no universal threshold for robustness, a higher FI generally indicates more reliable results (Andrade, 2020). To adjust for sample size, the Fragility Quotient (FQ), calculated as FI/sample size, has been proposed to provide a normalized measure of fragility (Heston, 2023). Understanding fragility allows researchers and clinicians to interpret trial results with more caution, particularly when clinical decisions are based on fragile evidence (Garcia et al., 2023).
Several studies have highlighted the practical implications of the FI. Meyer et al. (2014) demonstrated that in a pulmonary embolism trial, altering the outcomes of only three patients rendered the results non-significant, exposing the limited robustness of the findings. Similarly, Tignanelli and Napolitano (2019) emphasized the value of the FI in identifying fragile findings and improving patient care by fostering a deeper understanding of trial robustness. In critical care research, Ridgeon et al. (2016) revealed that many trials reporting statistically significant effects on mortality had low Fragility Index scores, suggesting that their results relied heavily on a small number of events, raising concerns about their robustness. With the development of tools like the fragility R package (Lin and Chu, 2022), calculating and visualizing the FI for clinical trials with binary outcomes has become more accessible. However, the FI has some limitations, particularly its reliance on binary outcomes, small sample sizes, and the use of Fisher’s exact test, which can affect its accurate calculation, especially in studies involving time-to-event data (Baer et al., 2021; Potter, 2020).
To overcome these limitations in survival analysis, the concept of fragility has been extended to time-to-event data in two arm trials with the introduction of the Survival-Inferred Fragility Index (SIFI) (Bomze et al., 2020). The SIFI measures the robustness of statistically significant findings in clinical trials with survival endpoints by identifying the minimum number of patients with the longest survival times in treatment group who must be reclassified to the control group to reverse statistical significance, typically assessed using a two-sided log-rank test (). This provides a more precise measure of how sensitive survival outcomes are to changes in the data, offering deeper insights into trial stability. Liu et al. (2024) assessed the robustness of 332 phase III oncology trials using the SIFI and found that trials involving targeted therapies, progression-free survival endpoints, and positive outcomes tend to be the most robust. Similarly, Olsen et al. (2024) analyzed 113 pediatric oncology phase 3 trials and found that pediatric trials were similarly or less fragile compared to adult oncology trials.
Historically, the FI has been used primarily in two-arm or multi-arm trials. In this paper, we extend the concept of the Fragility Index to time-to-event data in single-arm trials and explore its practical applications. Section 2 provides a detailed definition of the FI for time-to-event data in single arm trials, explaining how it measures the robustness of trial outcomes. Section 3 presents case studies using real-world survival datasets, illustrating the usefulness of the FI in practice. Finally, Section 4 discusses the broader implications of using the FI for survival analysis, emphasizing its value as a supplementary metric for interpreting trial results and aiding clinical decision-making.
2. Methodology
In clinical trials, particularly in fields like oncology and chronic diseases, outcomes are frequently associated with the time until a significant event occurs, such as disease progression or death. This leads to the analysis of time-to-event data, also known as survival data. Survival analysis not only captures whether the event occurs but also when it happens, offering richer insights into the treatment’s efficacy and impact. These time-based outcomes are critical in understanding treatment effects over the course of a study and are a common focus in clinical research involving chronic or progressive conditions. A common challenge in survival analysis is the presence of censoring—a scenario where the exact time of the event is not fully observed. Right-censoring, the most common type, occurs when a patient either has not experienced the event by the end of the study or is lost to follow-up before the event occurs. In these cases, the patient’s data is incomplete, as we only know that the event did not occur up to a certain point in time, but the exact time (if or when it will occur) remains unknown. Analyzing such data poses complexities, particularly in accounting for censored observations.
To model time-to-event data in survival analysis, one of the widely used distributions is the exponential distribution. Its popularity can be traced back to the seminal works of Epstein (Epstein and Sobel, 1953, 1954; Epstein, 1954), who demonstrated its practicality and versatility in modeling life data and survival times. The exponential distribution has the memoryless property, meaning the hazard rate remains constant over time. This implies that the likelihood of the event occurring at any moment is independent of how much time has already passed, making it a suitable choice in situations where the event risk is expected to stay stable throughout the study.
The probability density function (pdf) of the exponential distribution is:
where is the rate parameter. The expected time to the event is the inverse of the rate parameter, , providing an estimate of the average time until the event occurs. The survival function, which gives the probability that a patient survives beyond a given time , is:
and the hazard function, which describes the instantaneous rate of the event occurrence given that the patient has survived up to time , is constant and given by:
2.1. Likelihood Function with Censoring
In the presence of censoring, the likelihood function must account for both observed events and censored data. Suppose there be independent observations and for , let represent the observed time and censoring indicator for patient , where if the event is observed and if the observation is censored. The likelihood function for independent observations is given by:
(2.1) |
2.2. Choice of Prior
In Bayesian analysis, the choice of prior distribution reflects initial beliefs about the parameter before observing the data. This could be based on prior studies, expert knowledge, or other relevant information available before the trial. A natural choice for the rate parameter is the Gamma prior, as it is conjugate to the exponential likelihood, simplifying posterior inference. The Gamma distribution is defined by the shape parameter and the rate parameter , with the probability density function:
Using Bayes’ theorem, the posterior distribution of is derived by combining the prior and the likelihood from the observed data. The posterior distribution remains a Gamma distribution with updated parameters:
Theorem 1 (Posterior Distribution of Rate Parameter ).
Given independent observations for , where represents the observed time and is the censoring indicator, and assuming a Gamma prior for , the posterior distribution of is:
Proof.
See Appendix A ∎
This posterior distribution provides an updated estimate of , incorporating both the prior belief and the observed survival data.
2.3. Fragility Index
In time-to-event data, drug effectiveness is often assessed by determining whether the posterior probability that the median survival time exceeds a specified threshold is above a certain confidence level. The Fragility Index (FI) provides an additional measure of robustness by quantifying how many events must change to bring this probability below that confidence level. Specifically, for survival data from single-arm trials, where the posterior probability that the median survival time exceeds a threshold initially surpasses a predefined confidence level , the Fragility Index (FI) is defined as the smallest number of censored observations with the shortest censoring times that, when reclassified as uncensored events, reduce the posterior probability below the specified confidence level .
This index provides a precise measure of the trial’s robustness, indicating how sensitive the results are to changes in the data. In essence, it indicates the minimum number of censored observations that must be reclassified to uncensored events to reduce confidence in the treatment effect. Mathematically, the median survival time for an exponential distribution is given by:
Theorem 2 (Median Survival Time for Exponential Distribution).
For a random variable following an exponential distribution with rate parameter , the median survival time is:
Proof.
See Appendix A ∎
The posterior probability that the median survival time exceeds a threshold is computed by integrating over the posterior distribution of .
Theorem 3 (Posterior Probability of Median Survival Time).
Given the posterior distribution of the rate parameter for an exponential survival model, with data , as , the posterior probability that the median survival time exceeds a threshold is:
This probability is computed by integrating the posterior Gamma distribution:
where is the updated shape parameter, and is the updated rate parameter.
The proof of the Theorem 3 follows directly from Theorem 1 and 2. The Fragility Index is determined by sequentially reclassifying censored observations with the smallest censoring times to uncensored events and recalculating the posterior probability until it falls below the specified confidence level. A higher Fragility Index indicates greater robustness of the study results, implying that the treatment effect remains consistent even when multiple censoring statuses are changed. Conversely, a lower Fragility Index suggests that the results are sensitive to small changes in the data, potentially undermining confidence in the conclusions. It is important to note that there is no universally defined threshold for the FI; it serves as a relative indicator of robustness rather than an absolute measure of significance. Therefore, the FI should be used alongside other statistical measures to provide a more comprehensive evaluation of a study’s findings.
3. Numerical Study
In this section, we apply the Fragility Index (FI) methodology to real-world clinical trial data to evaluate the robustness of study conclusions. By utilizing our R package fi111https://github.com/arnabkrmaity/fi, we efficiently calculate the FI, demonstrating its effectiveness and ease of use in survival analysis.
3.1. Case Study 1: Lung Cancer
We begin with the North Central Cancer Treatment Group data, available in the lung dataset from the survival package in R. This dataset comprises observations from patients with advanced lung cancer, providing detailed information on survival times and various clinical variables.
For this analysis, we randomly selected 30 patients from the dataset, of whom 22 had experienced the event (death), while the remaining 8 were censored. Using the observed survival times and censoring data, we calculated the posterior probability that the median survival time exceeded 7 months. A Gamma prior with shape parameter and rate parameter was chosen for the survival rate parameter , as it is weakly informative and conjugate to the exponential likelihood, making it an appropriate choice for survival analysis. The resulting posterior probability was 0.935, indicating a high likelihood that the median survival time exceeded 7 months.
The Kaplan-Meier curve for this dataset is shown in Figure 1.
The Fragility Index was determined by sequentially reclassifying censored observations with the shortest censoring times as events and recalculating the posterior probability until it fell below the predefined confidence threshold of 0.7, a standard choice balancing statistical confidence and flexibility. For this dataset, the Fragility Index was found to be 5. This means that reclassifying five censored patients as having experienced the event (death) would reduce the posterior probability of the median survival time exceeding 7 months to below 0.7. Given a sample size of 30, an FI of 5 suggests that the study’s conclusions are robust. This level of FI suggests that the findings of the study are stable and reliable, even with moderate changes in the data.
3.2. Case Study 2: Pembrolizumab in hepatocellular carcinoma (HCC)
For the third case study, we analyzed progression free survival time for Pembrolizumab, an immune checkpoint inhibitor widely used in the treatment of various cancers, including advanced hepatocellular carcinoma (HCC) and other malignancies. Pembrolizumab functions by blocking the programmed cell death protein 1 (PD-1) receptor, thereby enhancing the immune system’s ability to detect and destroy cancer cells. The study (Feun et al., 2019) focused on its application in advanced hepatocellular carcinoma (HCC), a disease that has shown modest response rates to checkpoint inhibitors. The Individual Patient Data (IPD) from the treatment arm was extracted from the Kaplan-Meier curve presented in Feun et al. (2019), using the MD Anderson Cancer Center’s IPD extraction tool222https://biostatistics.mdanderson.org/shinyapps/IPDfromKM/.
The dataset, constructed using the above extraction tool, comprises 28 patients. Among them, 20 experienced disease progression, while the remaining 8 were censored. Following the same methodology as in the previous analyses, we calculated the posterior probability that the median progression-free survival time exceeded 3.5 months. A Gamma prior with shape parameter and rate parameter was applied, consistent with our prior studies, and the cutoff for the posterior probability was set at 0.7. The resulting posterior probability was 0.958, indicating a high likelihood that the median progression-free survival time exceeded 3.5 months.
The Kaplan-Meier curve for this dataset is shown in Figure 2.
The Fragility Index for this dataset was determined to be 6, meaning that reclassifying six censored patients as having experienced the event (disease progression) would reduce the posterior probability of the median progression-free survival time exceeding 3.5 months to below 0.7. Given the sample size of 28, an FI of 6 indicates that the study’s conclusions are robust. This level of FI suggests that the findings regarding Pembrolizumab’s efficacy are stable and reliable, even with moderate changes in the data.
3.3. Case Study 3: Palbociclib in Breast Cancer
For this case study, we analyzed progression free survival time from a single-arm phase II study investigating Palbociclib, a CDK4/6 inhibitor commonly used for the treatment of hormone receptor (HR)-positive, HER2-negative advanced or metastatic breast cancer (MBC). The study protocols are outlined in detail in Krishnamurthy et al. (2022) and, as with Case Study 2, Individual Patient Data (IPD) for the treatment arm was obtained from the Kaplan-Meier curve in Krishnamurthy et al. (2022).
The dataset, constructed using the MD Anderson Cancer Center’s IPD extraction tool, includes 51 patients. Among them, 31 experienced disease progression, while the remaining patients were censored.
Using the same methodology as in the previous analysis, we calculated the posterior probability that the median survival time exceeded 15 months. A Gamma prior with shape parameter and rate parameter was applied, with the cutoff for the posterior probability set at 0.7. The resulting posterior probability was 0.948, indicating a high likelihood that the median survival time exceeded 15 months.
The Kaplan-Meier curve for this dataset is displayed in Figure 3.
The Fragility Index (FI) for this dataset was calculated to be 6, indicating that if six censored patients were reclassified as having experienced the event (disease progression), the posterior probability of the median survival time exceeding 15 months would drop below 0.7. With a sample size of 51, an FI of 6 suggests that the study’s conclusions are moderately robust and not entirely resistant to changes in the data. This implies that the treatment effect shows some sensitivity to data alterations.
4. Concluding Remarks
In this paper, we defined the Fragility Index (FI) for time-to-event endpoints in single-arm clinical trials, developed a Bayesian methodology using the exponential distribution, and demonstrated its application through three real-world case studies. We also developed an R package fi, to facilitate the calculation of the FI, making it accessible for researchers to apply this in similar studies. The FI quantifies the robustness of study conclusions by identifying the minimum number of censored observations that, when reclassified as events, reduce the posterior probability of the median survival time exceeding a specified threshold below a confidence level. Our case studies yielded FI values of 5 and 6, indicating moderate robustness; while the findings are relatively stable, they remain somewhat sensitive to data changes. This underscores the FI’s value as a complementary tool to traditional statistical methods, enhancing the interpretability and reliability of clinical trial outcomes, especially in the absence of control groups. However, the FI lacks a universal threshold and is influenced by the choice of prior distribution, necessitating careful interpretation and integration with other clinical impact measures. In conclusion, the Fragility Index represents a significant advancement in assessing the robustness of single-arm clinical trials with time-to-event data, supporting more informed and reliable clinical decision-making.
References
- Andrade (2020) Chittaranjan Andrade. The use and limitations of the fragility index in the interpretation of clinical trial findings. The Journal of Clinical Psychiatry, 81(2):21994, 2020.
- Baer et al. (2021) Benjamin R Baer, Mario Gaudino, Mary Charlson, Stephen E Fremes, and Martin T Wells. Fragility indices for only sufficiently likely modifications. Proceedings of the National Academy of Sciences, 118(49):e2105254118, 2021.
- Bomze et al. (2020) David Bomze, Nethanel Asher, Omar Hasan Ali, Lukas Flatz, Daniel Azoulay, Gal Markel, and Tomer Meirson. Survival-inferred fragility index of phase 3 clinical trials evaluating immune checkpoint inhibitors. JAMA network open, 3(10):e2017675–e2017675, 2020.
- Editorial (2019) Editorial. It’s time to talk about ditching statistical significance. Nature, 567(7748):283, 2019.
- Epstein (1954) Benjamin Epstein. Truncated life tests in the exponential case. The Annals of Mathematical Statistics, pages 555–564, 1954.
- Epstein and Sobel (1953) Benjamin Epstein and Milton Sobel. Life testing. Journal of the American Statistical Association, 48(263):486–502, 1953.
- Epstein and Sobel (1954) Benjamin Epstein and Milton Sobel. Some theorems relevant to life testing from an exponential distribution. The Annals of Mathematical Statistics, pages 373–381, 1954.
- Feun et al. (2019) Lynn G Feun, Ying-Ying Li, Chunjing Wu, Medhi Wangpaichitr, Patricia D Jones, Stephen P Richman, Beatrice Madrazo, Deukwoo Kwon, Monica Garcia-Buitrago, Paul Martin, et al. Phase 2 study of pembrolizumab and circulating biomarkers to predict anticancer response in advanced, unresectable hepatocellular carcinoma. Cancer, 125(20):3603–3614, 2019.
- Garcia et al. (2023) Marcos Vinicius Fernandes Garcia, Juliana Carvalho Ferreira, and Pedro Caruso. Fragility index and fragility quotient in randomized clinical trials. Jornal Brasileiro de Pneumologia, 49(01):e20230034, 2023.
- Heston (2023) Thomas F Heston. The robustness index: going beyond statistical significance by quantifying fragility. Cureus, 15(8), 2023.
- Krishnamurthy et al. (2022) Jairam Krishnamurthy, Jingqin Luo, Rama Suresh, Foluso Ademuyiwa, Caron Rigden, Timothy Rearden, Katherine Clifton, Katherine Weilbaecher, Ashley Frith, Anna Roshal, et al. A phase ii trial of an alternative schedule of palbociclib and embedded serum tk1 analysis. NPJ breast cancer, 8(1):35, 2022.
- Lin and Chu (2022) Lifeng Lin and Haitao Chu. Assessing and visualizing fragility of clinical results with binary outcomes in r using the fragility package. PLoS One, 17(6):e0268754, 2022.
- Liu et al. (2024) Y Liu, TA Lin, A Koong, C Lin, JA Jaoude, RR Patel, R Kouzy, MB El Alam, T Meirson, and EB Ludmir. The fragility of phase iii trials in oncology. International Journal of Radiation Oncology, Biology, Physics, 120(2):S42, 2024.
- Meyer et al. (2014) Guy Meyer, Eric Vicaut, Thierry Danays, Giancarlo Agnelli, Cecilia Becattini, Jan Beyer-Westendorf, Erich Bluhmki, Helene Bouvaist, Benjamin Brenner, Francis Couturaud, et al. Fibrinolysis for patients with intermediate-risk pulmonary embolism. New England Journal of Medicine, 370(15):1402–1411, 2014.
- Olsen et al. (2024) Hannah Olsen, Pei-Chi Kao, Caleb Richmond, David Stephen Shulman, Wendy B London, and Steven G DuBois. Statistical fragility of findings from randomized phase 3 trials in pediatric oncology., 2024.
- Potter (2020) Gail E Potter. Dismantling the fragility index: a demonstration of statistical reasoning. Statistics in Medicine, 39(26):3720–3731, 2020.
- Ridgeon et al. (2016) Elliott E Ridgeon, Paul J Young, Rinaldo Bellomo, Marta Mucchetti, Rosalba Lembo, and Giovanni Landoni. The fragility index in multicenter randomized controlled critical care trials. Critical care medicine, 44(7):1278–1284, 2016.
- Tignanelli and Napolitano (2019) Christopher J Tignanelli and Lena M Napolitano. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA surgery, 154(1):74–79, 2019.
- Walsh et al. (2014) Michael Walsh, Sadeesh K Srinathan, Daniel F McAuley, Marko Mrkobrada, Oren Levine, Christine Ribic, Amber O Molnar, Neil D Dattani, Andrew Burke, Gordon Guyatt, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a fragility index. Journal of clinical epidemiology, 67(6):622–628, 2014.
APPENDIX
The calculation of the Fragility Index and related numerical studies are implemented in R and made accessible through the GitHub repository (https://github.com/arnabkrmaity/fi).
A. Proof of the Theorems
Proof of Theorem 1.
The likelihood function for the censored data is:
By applying Bayes’ Theorem, the posterior distribution is proportional to the product of the likelihood and the prior:
Substituting the likelihood and prior expressions:
This is a Gamma distribution with updated parameters and , thus:
∎
Proof of Theorem 2.
The survival function for the exponential distribution is:
Setting , we solve:
∎