Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: merging survival packages #44

Open
RaphaelS1 opened this issue Jul 15, 2022 · 0 comments
Open

Discussion: merging survival packages #44

RaphaelS1 opened this issue Jul 15, 2022 · 0 comments

Comments

@RaphaelS1
Copy link

In similar spirit to #1 I have my own package (originally just hobby code to learn Julia, pls don't judge too harsh) for survival analysis. As @ararslan and I discussed on Slack I don't want to be 'competing' or causing duplicative code to be written so just wanted to open this issue to see if there are any parts I can merge in or if I should just archive it/keep for hobby code only.

So far I have:

  1. KaplanMeier (fitting, confidence intervals, plotting)
  2. NelsonAalen (fitting, confidence intervals, plotting)
  3. Surv - Generic survival object for left, right, or interval censoring so far. Has fields for pulling out key stats (e.g. risk sets, event times, etc.)
  4. WIP code for parametric PH models

My future plans were going to be:

  1. CoxPH
  2. Parameter PH and AFT (again with Distributions.jl for parametrisations)
  3. Transformers between distribution, ranking, and linear predictor predictions

My plan was then to hook this up between Turing.jl and mlr3proba for cross-language probabilistic ML in R.

If useful happy to go into detail about features/methods but won't for now.

I was curious about some comparisons, these might be of interest:

  1. Kaplan-Meier is 100x faster than R (as expected) and same results:
using RCall
using Random: seed!
using Distributions
using BenchmarkTools
using Survival
seed!(1)
n = 1000
T = round.(rand(Uniform(1, 10), n));
Δ = rand(Binomial(), n) .== 1;
surv = Surv(T, Δ, "right");
R"
    library(survival)
    time = $T
    status = $Δ
    surv = Surv(time, status)
";
@benchmark R"
    km = survfit(surv ~ 1)
"
julia> @benchmark R"
           km = survfit(surv ~ 1)
       "
BenchmarkTools.Trial: 6946 samples with 1 evaluation.
 Range (min  max):  599.208 μs  67.817 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     625.708 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   718.383 μs ±  1.690 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▂▁▁                                                      ▁
  ████████▇▇▇▆▆▅▅▆▄▄▄▄▄▅▁▁▃▄▄▃▃▃▄▁▁▄▁▁▁▁▁▃▄▁▁▃▃▃▁▁▁▁▃▃▃▃▁▁▃▁▁▃ █
  599 μs        Histogram: log(frequency) by time      2.26 ms <

 Memory estimate: 1.12 KiB, allocs estimate: 41.
julia> @benchmark kaplan(surv)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min  max):  6.067 μs  619.550 μs  ┊ GC (min  max): 0.00%  97.93%
 Time  (median):     6.308 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.881 μs ±  14.405 μs  ┊ GC (mean ± σ):  6.14% ±  2.92%

      ██▁                                                      
  ▂▅▆▆███▆▃▃▃▃▄▄▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  6.07 μs         Histogram: frequency by time        8.47 μs <

 Memory estimate: 8.03 KiB, allocs estimate: 151.
julia> R"
           paste(round(km$lower, 2),
           round(km$upper, 2), sep = ',')
       "
RObject{StrSxp}
 [1] "0.97,0.99" "0.9,0.94"  "0.83,0.88" "0.76,0.81" "0.69,0.75" "0.62,0.69"
 [7] "0.52,0.6"  "0.41,0.49" "0.27,0.35" "0.14,0.23"


julia> km = kaplan(surv);

julia> confint(km)
11-element Vector{Tuple{Float64, Float64}}:
 (1.0, 1.0)
 (0.9655902766870649, 0.9846564570042777)
 (0.90008189973811, 0.9343531290390091)
 (0.8304177051112486, 0.8755272501579991)
 (0.753866856584728, 0.8079562064849091)
 (0.6863770858305547, 0.7466648377965709)
 (0.6175078894755917, 0.683065386931595)
 (0.5222658655269816, 0.5941460334482926)
 (0.40985396069034197, 0.4878272995385684)
 (0.2674278465799201, 0.35115893829800077)
 (0.136515397416271, 0.22525900129765383)
  1. Competitive speeds against this package (mainly due to design decisions to slow creation of Surv object in favour of generating stats up front) - very interested in hearing opinions on this
julia> n = 1000;
julia> T = round.(rand(Uniform(1, 10), n));
julia> Δ = rand(Binomial(), n) .== 1;
julia> et = Survival.EventTime.(T, Δ);
julia> ot = Surv(T, Δ, "right");
julia> @btime EventTime.(T, Δ);
  792.120 ns (3 allocations: 15.81 KiB)
julia> @btime Surv(T, Δ, "right"); ## longer as expected due to postprocessing
  25.041 μs (49 allocations: 35.20 KiB)
julia> @btime (k = kaplan(srv); confint(k));
  6.700 μs (152 allocations: 8.27 KiB)
julia> @btime (k = fit(Survival.KaplanMeier, et); confint(k));
  36.041 μs (34 allocations: 52.33 KiB)
julia> @btime (k = kaplan(srv); confint(k));
  6.700 μs (152 allocations: 8.27 KiB)
julia> @btime (k = fit(Survival.KaplanMeier, et); confint(k));
  36.041 μs (34 allocations: 52.33 KiB)
julia> using Plots
julia> plot(kaplan(srv))

Screenshot 2022-07-15 at 19 27 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant