Fragility Index for Time-to-Event Endpoints in Single-Arm Clinical Trials

Arnab Kumar Maity, Jhanvi Garg, AND Cynthia Basu§
Abstract.

The reliability of clinical trial outcomes is crucial, especially in guiding medical decisions. In this paper, we introduce the Fragility Index (FI) for time-to-event endpoints in single-arm clinical trials—a novel metric designed to quantify the robustness of study conclusions. The FI represents the smallest number of censored observations that, when reclassified as uncensored events, causes the posterior probability of the median survival time exceeding a specified threshold to fall below a predefined confidence level. While drug effectiveness is typically assessed by determining whether the posterior probability exceeds a specified confidence level, the FI offers a complementary measure, indicating how robust these conclusions are to potential shifts in the data. Using a Bayesian approach, we develop a practical framework for computing the FI based on the exponential survival model. To facilitate the application of our method, we developed an R package fi, which provides a tool to compute the Fragility Index. Through real world case studies involving time to event data from single arms clinical trials, we demonstrate the utility of this index. Our findings highlight how the FI can be a valuable tool for assessing the robustness of survival analyses in single-arm studies, aiding researchers and clinicians in making more informed decisions.

Keywords: Bayesian analysis, Exponential survival model, Fragility Index, Single-arm clinical trials, Time-to-event data

Boehringer Ingelheim, 900 Ridgebury Rd, Ridgefield, CT 06877, Department of Statistics, Texas A&M University, College Station, TX 77843, USA, §Pfizer Inc., 10555 Science Center Dr, San Diego, CA 92121
Corresponding author (email: [email protected])

1. Introduction

The Fragility Index (FI) is a widely used metric in randomized controlled trials (RCTs) to assess the robustness of statistically significant findings, particularly in studies with small sample sizes or limited outcome events. Traditional significance testing, which relies on p-values, can sometimes create a false sense of confidence, as minor changes in the data may shift results from significant to non-significant, casting doubt on the reliability of conclusions (Editorial, 2019). For binary outcomes, the FI, introduced by Walsh et al. (2014), quantifies the vulnerability of a study’s results by determining the minimum number of events (e.g., successes or failures) that would need to change to reverse statistical significance. This is calculated by systematically altering the outcomes—changing non-events to events or vice versa—until the p-value exceeds the significance threshold, typically p>0.05𝑝0.05p>0.05italic_p > 0.05. For example, in a trial testing a new treatment, if statistical significance is observed, a low FI, meaning that just one or two more events would change the result, indicates that the findings are fragile and may lack robustness. Although there is no universal threshold for robustness, a higher FI generally indicates more reliable results (Andrade, 2020). To adjust for sample size, the Fragility Quotient (FQ), calculated as FI/sample size, has been proposed to provide a normalized measure of fragility (Heston, 2023). Understanding fragility allows researchers and clinicians to interpret trial results with more caution, particularly when clinical decisions are based on fragile evidence (Garcia et al., 2023).

Several studies have highlighted the practical implications of the FI. Meyer et al. (2014) demonstrated that in a pulmonary embolism trial, altering the outcomes of only three patients rendered the results non-significant, exposing the limited robustness of the findings. Similarly, Tignanelli and Napolitano (2019) emphasized the value of the FI in identifying fragile findings and improving patient care by fostering a deeper understanding of trial robustness. In critical care research, Ridgeon et al. (2016) revealed that many trials reporting statistically significant effects on mortality had low Fragility Index scores, suggesting that their results relied heavily on a small number of events, raising concerns about their robustness. With the development of tools like the fragility R package (Lin and Chu, 2022), calculating and visualizing the FI for clinical trials with binary outcomes has become more accessible. However, the FI has some limitations, particularly its reliance on binary outcomes, small sample sizes, and the use of Fisher’s exact test, which can affect its accurate calculation, especially in studies involving time-to-event data (Baer et al., 2021; Potter, 2020).

To overcome these limitations in survival analysis, the concept of fragility has been extended to time-to-event data in two arm trials with the introduction of the Survival-Inferred Fragility Index (SIFI) (Bomze et al., 2020). The SIFI measures the robustness of statistically significant findings in clinical trials with survival endpoints by identifying the minimum number of patients with the longest survival times in treatment group who must be reclassified to the control group to reverse statistical significance, typically assessed using a two-sided log-rank test (P>0.05𝑃0.05P>0.05italic_P > 0.05). This provides a more precise measure of how sensitive survival outcomes are to changes in the data, offering deeper insights into trial stability. Liu et al. (2024) assessed the robustness of 332 phase III oncology trials using the SIFI and found that trials involving targeted therapies, progression-free survival endpoints, and positive outcomes tend to be the most robust. Similarly, Olsen et al. (2024) analyzed 113 pediatric oncology phase 3 trials and found that pediatric trials were similarly or less fragile compared to adult oncology trials.

Historically, the FI has been used primarily in two-arm or multi-arm trials. In this paper, we extend the concept of the Fragility Index to time-to-event data in single-arm trials and explore its practical applications. Section 2 provides a detailed definition of the FI for time-to-event data in single arm trials, explaining how it measures the robustness of trial outcomes. Section 3 presents case studies using real-world survival datasets, illustrating the usefulness of the FI in practice. Finally, Section 4 discusses the broader implications of using the FI for survival analysis, emphasizing its value as a supplementary metric for interpreting trial results and aiding clinical decision-making.

2. Methodology

In clinical trials, particularly in fields like oncology and chronic diseases, outcomes are frequently associated with the time until a significant event occurs, such as disease progression or death. This leads to the analysis of time-to-event data, also known as survival data. Survival analysis not only captures whether the event occurs but also when it happens, offering richer insights into the treatment’s efficacy and impact. These time-based outcomes are critical in understanding treatment effects over the course of a study and are a common focus in clinical research involving chronic or progressive conditions. A common challenge in survival analysis is the presence of censoring—a scenario where the exact time of the event is not fully observed. Right-censoring, the most common type, occurs when a patient either has not experienced the event by the end of the study or is lost to follow-up before the event occurs. In these cases, the patient’s data is incomplete, as we only know that the event did not occur up to a certain point in time, but the exact time (if or when it will occur) remains unknown. Analyzing such data poses complexities, particularly in accounting for censored observations.

To model time-to-event data in survival analysis, one of the widely used distributions is the exponential distribution. Its popularity can be traced back to the seminal works of Epstein (Epstein and Sobel, 1953, 1954; Epstein, 1954), who demonstrated its practicality and versatility in modeling life data and survival times. The exponential distribution has the memoryless property, meaning the hazard rate remains constant over time. This implies that the likelihood of the event occurring at any moment is independent of how much time has already passed, making it a suitable choice in situations where the event risk is expected to stay stable throughout the study.

The probability density function (pdf) of the exponential distribution is:

f(t)=λeλt,t0formulae-sequence𝑓𝑡𝜆superscript𝑒𝜆𝑡𝑡0f(t)=\lambda e^{-\lambda t},\quad t\geq 0italic_f ( italic_t ) = italic_λ italic_e start_POSTSUPERSCRIPT - italic_λ italic_t end_POSTSUPERSCRIPT , italic_t ≥ 0

where λ𝜆\lambdaitalic_λ is the rate parameter. The expected time to the event is the inverse of the rate parameter, 1/λ1𝜆1/\lambda1 / italic_λ, providing an estimate of the average time until the event occurs. The survival function, which gives the probability that a patient survives beyond a given time t𝑡titalic_t, is:

S(t)=eλt𝑆𝑡superscript𝑒𝜆𝑡S(t)=e^{-\lambda t}italic_S ( italic_t ) = italic_e start_POSTSUPERSCRIPT - italic_λ italic_t end_POSTSUPERSCRIPT

and the hazard function, which describes the instantaneous rate of the event occurrence given that the patient has survived up to time t𝑡titalic_t, is constant and given by:

h(t)=λ𝑡𝜆h(t)=\lambdaitalic_h ( italic_t ) = italic_λ

2.1. Likelihood Function with Censoring

In the presence of censoring, the likelihood function must account for both observed events and censored data. Suppose there be n𝑛nitalic_n independent observations and for iin𝑖𝑖𝑛i\leq i\leq nitalic_i ≤ italic_i ≤ italic_n, let (Ti,δi)subscript𝑇𝑖subscript𝛿𝑖(T_{i},\delta_{i})( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represent the observed time and censoring indicator for patient i𝑖iitalic_i, where δi=1subscript𝛿𝑖1\delta_{i}=1italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 if the event is observed and δi=0subscript𝛿𝑖0\delta_{i}=0italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 if the observation is censored. The likelihood function for n𝑛nitalic_n independent observations is given by:

(2.1) L(λ)=i=1n[λeλTi]δi[eλTi]1δi𝐿𝜆superscriptsubscriptproduct𝑖1𝑛superscriptdelimited-[]𝜆superscript𝑒𝜆subscript𝑇𝑖subscript𝛿𝑖superscriptdelimited-[]superscript𝑒𝜆subscript𝑇𝑖1subscript𝛿𝑖L(\lambda)=\prod_{i=1}^{n}\left[\lambda e^{-\lambda T_{i}}\right]^{\delta_{i}}% \left[e^{-\lambda T_{i}}\right]^{1-\delta_{i}}italic_L ( italic_λ ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_λ italic_e start_POSTSUPERSCRIPT - italic_λ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ italic_e start_POSTSUPERSCRIPT - italic_λ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

2.2. Choice of Prior

In Bayesian analysis, the choice of prior distribution reflects initial beliefs about the parameter before observing the data. This could be based on prior studies, expert knowledge, or other relevant information available before the trial. A natural choice for the rate parameter λ𝜆\lambdaitalic_λ is the Gamma prior, as it is conjugate to the exponential likelihood, simplifying posterior inference. The Gamma distribution is defined by the shape parameter α𝛼\alphaitalic_α and the rate parameter β𝛽\betaitalic_β, with the probability density function:

π(λ)=βαΓ(α)λα1eβλ,λ>0formulae-sequence𝜋𝜆superscript𝛽𝛼Γ𝛼superscript𝜆𝛼1superscript𝑒𝛽𝜆𝜆0\pi(\lambda)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda^{\alpha-1}e^{-\beta% \lambda},\quad\lambda>0italic_π ( italic_λ ) = divide start_ARG italic_β start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_α ) end_ARG italic_λ start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β italic_λ end_POSTSUPERSCRIPT , italic_λ > 0

Using Bayes’ theorem, the posterior distribution of λ𝜆\lambdaitalic_λ is derived by combining the prior and the likelihood from the observed data. The posterior distribution remains a Gamma distribution with updated parameters:

α=α+i=1nδi,β=β+i=1nTiformulae-sequencesuperscript𝛼𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖superscript𝛽𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\alpha^{\prime}=\alpha+\sum_{i=1}^{n}\delta_{i},\quad\beta^{\prime}=\beta+\sum% _{i=1}^{n}T_{i}italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Theorem 1 (Posterior Distribution of Rate Parameter λ𝜆\lambdaitalic_λ).

Given n𝑛nitalic_n independent observations (Ti,δi)subscript𝑇𝑖subscript𝛿𝑖(T_{i},\delta_{i})( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n, where Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the observed time and δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the censoring indicator, and assuming a Gamma prior Gamma(α,β)Gamma𝛼𝛽\text{Gamma}(\alpha,\beta)Gamma ( italic_α , italic_β ) for λ𝜆\lambdaitalic_λ, the posterior distribution of λ𝜆\lambdaitalic_λ is:

λ|(Ti,δi)i=1nGamma(α+i=1nδi,β+i=1nTi)similar-toconditional𝜆superscriptsubscriptsubscript𝑇𝑖subscript𝛿𝑖𝑖1𝑛Gamma𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\lambda\,|\,\left(T_{i},\delta_{i}\right)_{i=1}^{n}\sim\text{Gamma}\left(% \alpha+\sum_{i=1}^{n}\delta_{i},\,\beta+\sum_{i=1}^{n}T_{i}\right)italic_λ | ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∼ Gamma ( italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
Proof.

See Appendix A

This posterior distribution provides an updated estimate of λ𝜆\lambdaitalic_λ, incorporating both the prior belief and the observed survival data.

2.3. Fragility Index

In time-to-event data, drug effectiveness is often assessed by determining whether the posterior probability that the median survival time exceeds a specified threshold is above a certain confidence level. The Fragility Index (FI) provides an additional measure of robustness by quantifying how many events must change to bring this probability below that confidence level. Specifically, for survival data from single-arm trials, where the posterior probability that the median survival time exceeds a threshold t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT initially surpasses a predefined confidence level p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the Fragility Index (FI) is defined as the smallest number k𝑘kitalic_k of censored observations with the shortest censoring times that, when reclassified as uncensored events, reduce the posterior probability below the specified confidence level p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

This index provides a precise measure of the trial’s robustness, indicating how sensitive the results are to changes in the data. In essence, it indicates the minimum number of censored observations that must be reclassified to uncensored events to reduce confidence in the treatment effect. Mathematically, the median survival time tmedsubscript𝑡medt_{\text{med}}italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT for an exponential distribution is given by:

tmed=ln2λsubscript𝑡med2𝜆t_{\text{med}}=\frac{\ln 2}{\lambda}italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT = divide start_ARG roman_ln 2 end_ARG start_ARG italic_λ end_ARG
Theorem 2 (Median Survival Time for Exponential Distribution).

For a random variable T𝑇Titalic_T following an exponential distribution with rate parameter λ𝜆\lambdaitalic_λ, the median survival time is:

tmed=ln2λsubscript𝑡med2𝜆t_{\text{med}}=\frac{\ln 2}{\lambda}italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT = divide start_ARG roman_ln 2 end_ARG start_ARG italic_λ end_ARG
Proof.

See Appendix A

The posterior probability that the median survival time exceeds a threshold t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is computed by integrating over the posterior distribution of λ𝜆\lambdaitalic_λ.

Theorem 3 (Posterior Probability of Median Survival Time).

Given the posterior distribution of the rate parameter λ𝜆\lambdaitalic_λ for an exponential survival model, with data (Ti,δi)i=1nsuperscriptsubscriptsubscript𝑇𝑖subscript𝛿𝑖𝑖1𝑛\left(T_{i},\delta_{i}\right)_{i=1}^{n}( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, as Gamma(α+i=1nδi,β+i=1nTi)Gamma𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\text{Gamma}\left(\alpha+\sum_{i=1}^{n}\delta_{i},\,\beta+\sum_{i=1}^{n}T_{i}\right)Gamma ( italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), the posterior probability that the median survival time tmedsubscript𝑡medt_{\text{med}}italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT exceeds a threshold t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is:

P(tmed>t0|(Ti,δi)i=1n)=P(λ<ln2t0|(Ti,δi)i=1n).𝑃subscript𝑡medconditionalsubscript𝑡0superscriptsubscriptsubscript𝑇𝑖subscript𝛿𝑖𝑖1𝑛𝑃𝜆bra2subscript𝑡0superscriptsubscriptsubscript𝑇𝑖subscript𝛿𝑖𝑖1𝑛P\left(t_{\text{med}}>t_{0}\,|\,\left(T_{i},\delta_{i}\right)_{i=1}^{n}\right)% =P\left(\lambda<\frac{\ln 2}{t_{0}}\,|\,\left(T_{i},\delta_{i}\right)_{i=1}^{n% }\right).italic_P ( italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT > italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_P ( italic_λ < divide start_ARG roman_ln 2 end_ARG start_ARG italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG | ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) .

This probability is computed by integrating the posterior Gamma distribution:

P(tmed>t0)=0ln2t0βαΓ(α)λα1eβλ𝑑λ,𝑃subscript𝑡medsubscript𝑡0superscriptsubscript02subscript𝑡0superscript𝛽superscript𝛼Γsuperscript𝛼superscript𝜆superscript𝛼1superscript𝑒superscript𝛽𝜆differential-d𝜆P\left(t_{\text{med}}>t_{0}\right)=\int_{0}^{\frac{\ln 2}{t_{0}}}\frac{\beta^{% \prime\alpha^{\prime}}}{\Gamma(\alpha^{\prime})}\lambda^{\alpha^{\prime}-1}e^{% -\beta^{\prime}\lambda}\,d\lambda,italic_P ( italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT > italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG roman_ln 2 end_ARG start_ARG italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT divide start_ARG italic_β start_POSTSUPERSCRIPT ′ italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG italic_λ start_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_d italic_λ ,

where α=α+i=1nδisuperscript𝛼𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖\alpha^{\prime}=\alpha+\sum_{i=1}^{n}\delta_{i}italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the updated shape parameter, and β=β+i=1nTisuperscript𝛽𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\beta^{\prime}=\beta+\sum_{i=1}^{n}T_{i}italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the updated rate parameter.

The proof of the Theorem 3 follows directly from Theorem 1 and 2. The Fragility Index is determined by sequentially reclassifying censored observations with the smallest censoring times to uncensored events and recalculating the posterior probability until it falls below the specified confidence level. A higher Fragility Index indicates greater robustness of the study results, implying that the treatment effect remains consistent even when multiple censoring statuses are changed. Conversely, a lower Fragility Index suggests that the results are sensitive to small changes in the data, potentially undermining confidence in the conclusions. It is important to note that there is no universally defined threshold for the FI; it serves as a relative indicator of robustness rather than an absolute measure of significance. Therefore, the FI should be used alongside other statistical measures to provide a more comprehensive evaluation of a study’s findings.

3. Numerical Study

In this section, we apply the Fragility Index (FI) methodology to real-world clinical trial data to evaluate the robustness of study conclusions. By utilizing our R package fi111https://github.com/arnabkrmaity/fi, we efficiently calculate the FI, demonstrating its effectiveness and ease of use in survival analysis.

3.1. Case Study 1: Lung Cancer

We begin with the North Central Cancer Treatment Group data, available in the lung dataset from the survival package in R. This dataset comprises observations from patients with advanced lung cancer, providing detailed information on survival times and various clinical variables.

For this analysis, we randomly selected 30 patients from the dataset, of whom 22 had experienced the event (death), while the remaining 8 were censored. Using the observed survival times and censoring data, we calculated the posterior probability that the median survival time exceeded 7 months. A Gamma prior with shape parameter α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 and rate parameter β=0.5𝛽0.5\beta=0.5italic_β = 0.5 was chosen for the survival rate parameter λ𝜆\lambdaitalic_λ, as it is weakly informative and conjugate to the exponential likelihood, making it an appropriate choice for survival analysis. The resulting posterior probability was 0.935, indicating a high likelihood that the median survival time exceeded 7 months.

The Kaplan-Meier curve for this dataset is shown in Figure 1.

Refer to caption
Figure 1. Kaplan-Meier curve for the lung cancer dataset

The Fragility Index was determined by sequentially reclassifying censored observations with the shortest censoring times as events and recalculating the posterior probability until it fell below the predefined confidence threshold of 0.7, a standard choice balancing statistical confidence and flexibility. For this dataset, the Fragility Index was found to be 5. This means that reclassifying five censored patients as having experienced the event (death) would reduce the posterior probability of the median survival time exceeding 7 months to below 0.7. Given a sample size of 30, an FI of 5 suggests that the study’s conclusions are robust. This level of FI suggests that the findings of the study are stable and reliable, even with moderate changes in the data.

3.2. Case Study 2: Pembrolizumab in hepatocellular carcinoma (HCC)

For the third case study, we analyzed progression free survival time for Pembrolizumab, an immune checkpoint inhibitor widely used in the treatment of various cancers, including advanced hepatocellular carcinoma (HCC) and other malignancies. Pembrolizumab functions by blocking the programmed cell death protein 1 (PD-1) receptor, thereby enhancing the immune system’s ability to detect and destroy cancer cells. The study (Feun et al., 2019) focused on its application in advanced hepatocellular carcinoma (HCC), a disease that has shown modest response rates to checkpoint inhibitors. The Individual Patient Data (IPD) from the treatment arm was extracted from the Kaplan-Meier curve presented in Feun et al. (2019), using the MD Anderson Cancer Center’s IPD extraction tool222https://biostatistics.mdanderson.org/shinyapps/IPDfromKM/.

The dataset, constructed using the above extraction tool, comprises 28 patients. Among them, 20 experienced disease progression, while the remaining 8 were censored. Following the same methodology as in the previous analyses, we calculated the posterior probability that the median progression-free survival time exceeded 3.5 months. A Gamma prior with shape parameter α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 and rate parameter β=0.5𝛽0.5\beta=0.5italic_β = 0.5 was applied, consistent with our prior studies, and the cutoff for the posterior probability was set at 0.7. The resulting posterior probability was 0.958, indicating a high likelihood that the median progression-free survival time exceeded 3.5 months.

The Kaplan-Meier curve for this dataset is shown in Figure 2.

Refer to caption
Figure 2. Kaplan-Meier curve for the breast cancer dataset treated with Pembrolizumab

The Fragility Index for this dataset was determined to be 6, meaning that reclassifying six censored patients as having experienced the event (disease progression) would reduce the posterior probability of the median progression-free survival time exceeding 3.5 months to below 0.7. Given the sample size of 28, an FI of 6 indicates that the study’s conclusions are robust. This level of FI suggests that the findings regarding Pembrolizumab’s efficacy are stable and reliable, even with moderate changes in the data.

3.3. Case Study 3: Palbociclib in Breast Cancer

For this case study, we analyzed progression free survival time from a single-arm phase II study investigating Palbociclib, a CDK4/6 inhibitor commonly used for the treatment of hormone receptor (HR)-positive, HER2-negative advanced or metastatic breast cancer (MBC). The study protocols are outlined in detail in Krishnamurthy et al. (2022) and, as with Case Study 2, Individual Patient Data (IPD) for the treatment arm was obtained from the Kaplan-Meier curve in Krishnamurthy et al. (2022).

The dataset, constructed using the MD Anderson Cancer Center’s IPD extraction tool, includes 51 patients. Among them, 31 experienced disease progression, while the remaining patients were censored.

Using the same methodology as in the previous analysis, we calculated the posterior probability that the median survival time exceeded 15 months. A Gamma prior with shape parameter α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 and rate parameter β=0.5𝛽0.5\beta=0.5italic_β = 0.5 was applied, with the cutoff for the posterior probability set at 0.7. The resulting posterior probability was 0.948, indicating a high likelihood that the median survival time exceeded 15 months.

The Kaplan-Meier curve for this dataset is displayed in Figure 3.

Refer to caption
Figure 3. Kaplan-Meier curve for the breast cancer dataset treated with Palbociclib

The Fragility Index (FI) for this dataset was calculated to be 6, indicating that if six censored patients were reclassified as having experienced the event (disease progression), the posterior probability of the median survival time exceeding 15 months would drop below 0.7. With a sample size of 51, an FI of 6 suggests that the study’s conclusions are moderately robust and not entirely resistant to changes in the data. This implies that the treatment effect shows some sensitivity to data alterations.

4. Concluding Remarks

In this paper, we defined the Fragility Index (FI) for time-to-event endpoints in single-arm clinical trials, developed a Bayesian methodology using the exponential distribution, and demonstrated its application through three real-world case studies. We also developed an R package fi, to facilitate the calculation of the FI, making it accessible for researchers to apply this in similar studies. The FI quantifies the robustness of study conclusions by identifying the minimum number of censored observations that, when reclassified as events, reduce the posterior probability of the median survival time exceeding a specified threshold below a confidence level. Our case studies yielded FI values of 5 and 6, indicating moderate robustness; while the findings are relatively stable, they remain somewhat sensitive to data changes. This underscores the FI’s value as a complementary tool to traditional statistical methods, enhancing the interpretability and reliability of clinical trial outcomes, especially in the absence of control groups. However, the FI lacks a universal threshold and is influenced by the choice of prior distribution, necessitating careful interpretation and integration with other clinical impact measures. In conclusion, the Fragility Index represents a significant advancement in assessing the robustness of single-arm clinical trials with time-to-event data, supporting more informed and reliable clinical decision-making.

References

  • Andrade (2020) Chittaranjan Andrade. The use and limitations of the fragility index in the interpretation of clinical trial findings. The Journal of Clinical Psychiatry, 81(2):21994, 2020.
  • Baer et al. (2021) Benjamin R Baer, Mario Gaudino, Mary Charlson, Stephen E Fremes, and Martin T Wells. Fragility indices for only sufficiently likely modifications. Proceedings of the National Academy of Sciences, 118(49):e2105254118, 2021.
  • Bomze et al. (2020) David Bomze, Nethanel Asher, Omar Hasan Ali, Lukas Flatz, Daniel Azoulay, Gal Markel, and Tomer Meirson. Survival-inferred fragility index of phase 3 clinical trials evaluating immune checkpoint inhibitors. JAMA network open, 3(10):e2017675–e2017675, 2020.
  • Editorial (2019) Editorial. It’s time to talk about ditching statistical significance. Nature, 567(7748):283, 2019.
  • Epstein (1954) Benjamin Epstein. Truncated life tests in the exponential case. The Annals of Mathematical Statistics, pages 555–564, 1954.
  • Epstein and Sobel (1953) Benjamin Epstein and Milton Sobel. Life testing. Journal of the American Statistical Association, 48(263):486–502, 1953.
  • Epstein and Sobel (1954) Benjamin Epstein and Milton Sobel. Some theorems relevant to life testing from an exponential distribution. The Annals of Mathematical Statistics, pages 373–381, 1954.
  • Feun et al. (2019) Lynn G Feun, Ying-Ying Li, Chunjing Wu, Medhi Wangpaichitr, Patricia D Jones, Stephen P Richman, Beatrice Madrazo, Deukwoo Kwon, Monica Garcia-Buitrago, Paul Martin, et al. Phase 2 study of pembrolizumab and circulating biomarkers to predict anticancer response in advanced, unresectable hepatocellular carcinoma. Cancer, 125(20):3603–3614, 2019.
  • Garcia et al. (2023) Marcos Vinicius Fernandes Garcia, Juliana Carvalho Ferreira, and Pedro Caruso. Fragility index and fragility quotient in randomized clinical trials. Jornal Brasileiro de Pneumologia, 49(01):e20230034, 2023.
  • Heston (2023) Thomas F Heston. The robustness index: going beyond statistical significance by quantifying fragility. Cureus, 15(8), 2023.
  • Krishnamurthy et al. (2022) Jairam Krishnamurthy, Jingqin Luo, Rama Suresh, Foluso Ademuyiwa, Caron Rigden, Timothy Rearden, Katherine Clifton, Katherine Weilbaecher, Ashley Frith, Anna Roshal, et al. A phase ii trial of an alternative schedule of palbociclib and embedded serum tk1 analysis. NPJ breast cancer, 8(1):35, 2022.
  • Lin and Chu (2022) Lifeng Lin and Haitao Chu. Assessing and visualizing fragility of clinical results with binary outcomes in r using the fragility package. PLoS One, 17(6):e0268754, 2022.
  • Liu et al. (2024) Y Liu, TA Lin, A Koong, C Lin, JA Jaoude, RR Patel, R Kouzy, MB El Alam, T Meirson, and EB Ludmir. The fragility of phase iii trials in oncology. International Journal of Radiation Oncology, Biology, Physics, 120(2):S42, 2024.
  • Meyer et al. (2014) Guy Meyer, Eric Vicaut, Thierry Danays, Giancarlo Agnelli, Cecilia Becattini, Jan Beyer-Westendorf, Erich Bluhmki, Helene Bouvaist, Benjamin Brenner, Francis Couturaud, et al. Fibrinolysis for patients with intermediate-risk pulmonary embolism. New England Journal of Medicine, 370(15):1402–1411, 2014.
  • Olsen et al. (2024) Hannah Olsen, Pei-Chi Kao, Caleb Richmond, David Stephen Shulman, Wendy B London, and Steven G DuBois. Statistical fragility of findings from randomized phase 3 trials in pediatric oncology., 2024.
  • Potter (2020) Gail E Potter. Dismantling the fragility index: a demonstration of statistical reasoning. Statistics in Medicine, 39(26):3720–3731, 2020.
  • Ridgeon et al. (2016) Elliott E Ridgeon, Paul J Young, Rinaldo Bellomo, Marta Mucchetti, Rosalba Lembo, and Giovanni Landoni. The fragility index in multicenter randomized controlled critical care trials. Critical care medicine, 44(7):1278–1284, 2016.
  • Tignanelli and Napolitano (2019) Christopher J Tignanelli and Lena M Napolitano. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA surgery, 154(1):74–79, 2019.
  • Walsh et al. (2014) Michael Walsh, Sadeesh K Srinathan, Daniel F McAuley, Marko Mrkobrada, Oren Levine, Christine Ribic, Amber O Molnar, Neil D Dattani, Andrew Burke, Gordon Guyatt, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a fragility index. Journal of clinical epidemiology, 67(6):622–628, 2014.

APPENDIX

The calculation of the Fragility Index and related numerical studies are implemented in R and made accessible through the GitHub repository (https://github.com/arnabkrmaity/fi).

A. Proof of the Theorems

Proof of Theorem 1.

The likelihood function for the censored data is:

L(λ)=i=1n(λeλTi)δi(eλTi)1δi=λi=1nδieλi=1nTi𝐿𝜆superscriptsubscriptproduct𝑖1𝑛superscript𝜆superscript𝑒𝜆subscript𝑇𝑖subscript𝛿𝑖superscriptsuperscript𝑒𝜆subscript𝑇𝑖1subscript𝛿𝑖superscript𝜆superscriptsubscript𝑖1𝑛subscript𝛿𝑖superscript𝑒𝜆superscriptsubscript𝑖1𝑛subscript𝑇𝑖L(\lambda)=\prod_{i=1}^{n}\left(\lambda e^{-\lambda T_{i}}\right)^{\delta_{i}}% \left(e^{-\lambda T_{i}}\right)^{1-\delta_{i}}=\lambda^{\sum_{i=1}^{n}\delta_{% i}}e^{-\lambda\sum_{i=1}^{n}T_{i}}italic_L ( italic_λ ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_λ italic_e start_POSTSUPERSCRIPT - italic_λ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_e start_POSTSUPERSCRIPT - italic_λ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_λ start_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_λ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

By applying Bayes’ Theorem, the posterior distribution is proportional to the product of the likelihood and the prior:

p(λ|T1,,Tn,δ1,,δn)L(λ)π(λ)proportional-to𝑝conditional𝜆subscript𝑇1subscript𝑇𝑛subscript𝛿1subscript𝛿𝑛𝐿𝜆𝜋𝜆p(\lambda\,|\,T_{1},\dots,T_{n},\delta_{1},\dots,\delta_{n})\propto L(\lambda)% \pi(\lambda)italic_p ( italic_λ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∝ italic_L ( italic_λ ) italic_π ( italic_λ )

Substituting the likelihood and prior expressions:

p(λ|T1,,Tn,δ1,,δn)λα+i=1nδi1eλ(β+i=1nTi)proportional-to𝑝conditional𝜆subscript𝑇1subscript𝑇𝑛subscript𝛿1subscript𝛿𝑛superscript𝜆𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖1superscript𝑒𝜆𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖p(\lambda\,|\,T_{1},\dots,T_{n},\delta_{1},\dots,\delta_{n})\propto\lambda^{% \alpha+\sum_{i=1}^{n}\delta_{i}-1}e^{-\lambda\left(\beta+\sum_{i=1}^{n}T_{i}% \right)}italic_p ( italic_λ | italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∝ italic_λ start_POSTSUPERSCRIPT italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_λ ( italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT

This is a Gamma distribution with updated parameters α=α+i=1nδisuperscript𝛼𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖\alpha^{\prime}=\alpha+\sum_{i=1}^{n}\delta_{i}italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and β=β+i=1nTisuperscript𝛽𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\beta^{\prime}=\beta+\sum_{i=1}^{n}T_{i}italic_β start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, thus:

λ|(Ti,δi)i=1nGamma(α+i=1nδi,β+i=1nTi)similar-toconditional𝜆superscriptsubscriptsubscript𝑇𝑖subscript𝛿𝑖𝑖1𝑛Gamma𝛼superscriptsubscript𝑖1𝑛subscript𝛿𝑖𝛽superscriptsubscript𝑖1𝑛subscript𝑇𝑖\lambda\,|\,\left(T_{i},\delta_{i}\right)_{i=1}^{n}\sim\text{Gamma}\left(% \alpha+\sum_{i=1}^{n}\delta_{i},\,\beta+\sum_{i=1}^{n}T_{i}\right)italic_λ | ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∼ Gamma ( italic_α + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

Proof of Theorem 2.

The survival function S(t)𝑆𝑡S(t)italic_S ( italic_t ) for the exponential distribution is:

S(t)=P(T>t)=eλt𝑆𝑡𝑃𝑇𝑡superscript𝑒𝜆𝑡S(t)=P(T>t)=e^{-\lambda t}italic_S ( italic_t ) = italic_P ( italic_T > italic_t ) = italic_e start_POSTSUPERSCRIPT - italic_λ italic_t end_POSTSUPERSCRIPT

Setting S(tmed)=0.5𝑆subscript𝑡med0.5S(t_{\text{med}})=0.5italic_S ( italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT ) = 0.5, we solve:

eλtmed=0.5tmed=ln0.5λ=ln2λformulae-sequencesuperscript𝑒𝜆subscript𝑡med0.5subscript𝑡med0.5𝜆2𝜆e^{-\lambda t_{\text{med}}}=0.5\quad\Rightarrow\quad t_{\text{med}}=\frac{\ln 0% .5}{-\lambda}=\frac{\ln 2}{\lambda}italic_e start_POSTSUPERSCRIPT - italic_λ italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0.5 ⇒ italic_t start_POSTSUBSCRIPT med end_POSTSUBSCRIPT = divide start_ARG roman_ln 0.5 end_ARG start_ARG - italic_λ end_ARG = divide start_ARG roman_ln 2 end_ARG start_ARG italic_λ end_ARG