This package is unmaintained. Its reliability is not guaranteed.
Tools for resampling data to assess model fits
Depending on the level of granularity, you can use several functions for resampling data:
splitrandom(df::DataFrame, proportion::Real)
: Use this to splitdf
into two randomly chosen pieces. Ifproportion == 0.75
, the first piece will contain ~75% of the data and the second piece will ~25% of the data.resample(df::DataFrame, n::Integer)
: Use this to generate a new data set of sizen
that is resampled with replacement from the rows ofdf
.jackknife(df::DataFrame, statistic::Function)
: Use this to run the jackknife. The Jackknife moves through the data, removing one row at a time and then applying the functionstatistic
to the remaining data. The results of all calls tostatistic
are stored in a vector that is returned to the caller.bootstrap(df::DataFrame, statistic::Function, n::Integer, proportion::Real)
: Use this to run the nonparametric bootstrap. The bootstrap resamples the datan
times with each resampled data set containingproportion
of the data. The functionstatistic
is called on each resampled data set. The results of all calls tostatistic
are stored in a vector that is returned to the caller.crossvalidate(df::DataFrame, train::Function, test::Function, n::Integer, proportion::Real)
: Use this function to fit a model using thetrain
function onn
resampled data sets and then test the fitted model using thetest
function on those same data sets. Each time, the training data set will containproportion
ofdf
and1 - proportion
will be held out as a test data set.kfold_crossvalidate(df::DataFrame, train::Function, test::Function, k::Integer)
: Use this function to fit a model using thetrain
function onk
data sets and then test the fitted model using thetest
function on those same data sets. Each time, the training data set will contain the majority of the data with one ofk
folds removed.
Using splitrandom
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
df1, df2 = splitrandom(df, 0.75)
Using resample
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
new_df = resample(df, 100)
Using jackknife
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
resampled_means = jackknife(df, df -> mean(df["A"]))
se_hat = std(resampled_means)
Using bootstrap
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
resampled_means = bootstrap(df, df -> mean(df["A"]), 1_000, 0.90)
se_hat = std(resampled_means)
Using crossvalidate
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
function train(df)
mean(df["A"])
end
function test(df, m)
sqrt(mean((df["A"] - m).^2))
end
n_reps = 100
training_results, test_results = crossvalidate(df, train, test, n_reps, 0.75)
Using kfold_crossvalidate
:
using DataFrames, Resampling
df = DataFrame()
df["A"] = 1:100
function train(df)
mean(df["A"])
end
function test(df, m)
sqrt(mean((df["A"] - m).^2))
end
k = 10
training_results, test_results = kfold_crossvalidate(df, train, test, k)