Skip to content

R package implementing subsampling methods to find informative samples from big data

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

Amalan-ConStat/NeEDS4BigData

Repository files navigation

NeEDS4BigData

packageversion Dependencies MIT license

Project Status: Active - The project has reached a stable, usable state and is being actively developed. GitHub issues

codecov CodeFactor

The R package “NeEDS4BigData” provides approaches to implement subsampling methods to analyse big data.

How did the name “NeEDS4BigData” came through ?

New Experimental Design based Subsampling methods for Big Data.

How to engage with “NeEDS4BigData” the first time ?

## Installing the package from GitHub
devtools::install_github("Amalan-ConStat/NeEDS4BigData")

## Installing the package from CRAN
install.packages("NeEDS4BigData")

Subsampling Methods

  1. A- and L-optimality based subsampling for GLMs.
  2. A-optimality based subsampling for Gaussian Linear Models.
  3. Leverage sampling for GLMs.
  4. Local case control sampling for logistic regression.
  5. A-optimality based subsampling under measurement constraints for GLMs.
  6. Model robust subsampling method for GLMs.
  7. Subsampling method for GLMs when the model is potentially misspecified.

These seven methods are described in the following articles

  1. Introduction - explains the need for subsampling methods.
  2. Linear Regression - Basic sampling.
  3. Linear Regression - Model robust and misspecification.
  4. Logistic Regression - Basic sampling.
  5. Logistic Regression - Model robust and misspecification.
  6. Poisson Regression - Basic sampling.
  7. Poisson Regression - Model robust and misspecification.

For $2,4$ and $6$ we assume the main effects model can describe the data. While for $3,5$ and $7$ first we consider there are several models that can describe the big data, then later we assume the given main effects model is misspecified. Under these conditions from $2-7$ we explore subsampling for three given big data sets.

Thank You

Twitter

About

R package implementing subsampling methods to find informative samples from big data

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages