Tables.jl integration #25

tbeason · 2020-07-27T05:44:01Z

Something I stumbled across while working on the previous PR... free Tables.jl integration! Because the existing (+CIF) nonparametric estimators return vectors of equal length, why not go one extra step and make it table friendly?

Before this PR:

julia> ev = EventTime.(rand(10),rand(Bool,10))
10-element Array{EventTime{Float64},1}:
 0.35179991680800793
 0.40759565633515504
 0.8775546088667416
 0.45056768643591116+
 0.7839147900638677
 0.5359791316287386
 0.6307186426517366
 0.7518185123916523
 0.2001353457567876
 0.0664477717088301+

julia> km = fit(KaplanMeier,ev)
KaplanMeier{Float64}([0.0664477717088301, 0.2001353457567876, 0.35179991680800793, 0.40759565633515504, 0.45056768643591116, 0.5359791316287386, 0.6307186426517366, 0.7518185123916523, 0.7839147900638677, 0.8775546088667416], [0, 1, 1, 1, 0, 1, 1, 1, 1, 1], [1, 0, 0, 0, 1, 0, 0, 0, 0, 0], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1], [1.0, 0.8888888888888888, 0.7777777777777777, 0.6666666666666666, 0.6666666666666666, 0.5333333333333333, 0.4, 0.2666666666666667, 0.13333333333333336, 0.13333333333333336], [0.0, 0.11785113019775792, 0.1781741612749496, 0.23570226039551584, 0.23570226039551584, 0.32489314482696546, 0.4346134936801766, 0.5962847939999438, 0.9249624617007738, 0.9249624617007738])

Now with it...

julia> using DataFrames

julia> km |> DataFrame
10×6 DataFrame
│ Row │ times     │ nevents │ ncensor │ natrisk │ survival │ stderr   │
│     │ Float64   │ Int64   │ Int64   │ Int64   │ Float64  │ Float64  │
├─────┼───────────┼─────────┼─────────┼─────────┼──────────┼──────────┤
│ 1   │ 0.0664478 │ 0       │ 1       │ 10      │ 1.0      │ 0.0      │
│ 2   │ 0.200135  │ 1       │ 0       │ 9       │ 0.888889 │ 0.117851 │
│ 3   │ 0.3518    │ 1       │ 0       │ 8       │ 0.777778 │ 0.178174 │
│ 4   │ 0.407596  │ 1       │ 0       │ 7       │ 0.666667 │ 0.235702 │
│ 5   │ 0.450568  │ 0       │ 1       │ 6       │ 0.666667 │ 0.235702 │
│ 6   │ 0.535979  │ 1       │ 0       │ 5       │ 0.533333 │ 0.324893 │
│ 7   │ 0.630719  │ 1       │ 0       │ 4       │ 0.4      │ 0.434613 │
│ 8   │ 0.751819  │ 1       │ 0       │ 3       │ 0.266667 │ 0.596285 │
│ 9   │ 0.783915  │ 1       │ 0       │ 2       │ 0.133333 │ 0.924962 │
│ 10  │ 0.877555  │ 1       │ 0       │ 1       │ 0.133333 │ 0.924962 │

Needs tests and I guess a mention somewhere that this is possible.

nignatiadis · 2020-07-28T20:56:03Z

That's super useful! Makes it also very easy to use Efron-style [1] survival analysis based on logistic regression: just pass the Table to GLM.jl+StatsModels.jl.

[1] Efron, Bradley. "Logistic regression, survival analysis, and the Kaplan-Meier curve." Journal of the American statistical Association 83.402 (1988): 414-425.

AntoineHus · 2022-05-02T11:50:42Z

Hello,

Which tests are you looking for before doing the merge ? I think this is a nice feature

Thanks

tbeason · 2022-05-03T17:44:30Z

I think it is unlikely that I come back to finish this PR any time soon. Feel free to take it over and add whatever needs to be done to consider it ready to merge!

ararslan · 2022-07-10T20:14:44Z

Apologies for letting this languish for so long!

I gave this a fair bit of thought and while I definitely understand the appeal, I think I'm overall not in favor, at least with the current implementation. It effectively forces us to make guarantees about the particular fields of the types, namely that they all need to be vectors of equal length and have some kind of externally meaningful names and values. If we were to add another field to store some kind of intermediate computation or other kind of value, that would either end up getting exposed to the user in a rather surprising way or break this altogether. Now, it's possible to control this stuff with some getproperty/propertynames tomfoolery but I'd much prefer we not go down that route.

Something we could do is define things explicitly for each NonparametricEstimator subtype rather than generically for all of them to expose only particular things in the resulting table schema, which could be fields of the type or could be values derived from fields.

Note that in the meantime, you can do e.g. DataFrame(; (f => getfield(estimator, f) for f in fieldnames(typeof(estimator)))...), or omit DataFrame and construct a NamedTuple that way, which is a valid column table.

Both Kaplan-Meier and Nelson-Aalen compute the same set of basic counts at each unique time prior to computing their respective quantities of interest. The counts have a notably table-like format, so much so that they can implement the Tables.jl interface with minimal effort. Credit for the idea of integration with Tables.jl goes entirely to Tyler Beacon, author of PR #25, who has been added as a co-author of this commit. Co-Authored-By: Tyler Beason <[email protected]>

Tables.jl integration

5943624

ararslan mentioned this pull request Jul 25, 2022

Add an EventTable type that supports the Tables interface #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tables.jl integration #25

Tables.jl integration #25

tbeason commented Jul 27, 2020

nignatiadis commented Jul 28, 2020

AntoineHus commented May 2, 2022 •

edited

Loading

tbeason commented May 3, 2022

ararslan commented Jul 10, 2022

Tables.jl integration #25

Are you sure you want to change the base?

Tables.jl integration #25

Conversation

tbeason commented Jul 27, 2020

nignatiadis commented Jul 28, 2020

AntoineHus commented May 2, 2022 • edited Loading

tbeason commented May 3, 2022

ararslan commented Jul 10, 2022

AntoineHus commented May 2, 2022 •

edited

Loading