Analysis Plan

We investigate ~40 classification datasets from the OpenML database using some core machine-learning classifiers:

Gradient-Boosted Decision Trees (XGBoost)
Support Vector Machine (scikit-learn)
Multinomial Logistic Regression (scikit-learn)
MLP? (PyTorch, probably)
RandomForest? (XGBoost)

The key concept we investigate is Model Variance, that is, how the predictions of a model change due to various sampling and tuning factors. These various factors we refer to as sources of (model) variance, and include:

dataset size
number of features
sample noise sensitivity
hyperparameter sensitivity

Some of these sources of variance can cause different impacts depending whether they operate in the training data or can contribute to model variance only during training (training sources of variance) or on validation data (validation sources of variance). That is:

Training Sources of Variance
- training set size
- training sample noise
- hyperparameter sensitivity

Name		Name	Last commit message	Last commit date
Latest commit History 345 Commits
.vscode		.vscode
apptainer		apptainer
cc_results		cc_results
data		data
drafts		drafts
job_scripts		job_scripts
results/best_hps		results/best_hps
scripts		scripts
src		src
test		test
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
PLANNING.md		PLANNING.md
README.md		README.md
download_cedar_results.sh		download_cedar_results.sh
download_graham_results.sh		download_graham_results.sh
download_niagara_results.sh		download_niagara_results.sh
fix_tars.sh		fix_tars.sh
install.sh		install.sh
macos_requirements.txt		macos_requirements.txt
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.sh		pytest.sh
pytest_path.sh		pytest_path.sh
pytest_path_cuda.sh		pytest_path_cuda.sh
requirements.txt		requirements.txt
run_hperturbs.sh		run_hperturbs.sh
run_python.sh		run_python.sh
run_python_cuda.sh		run_python_cuda.sh
scratch.py		scratch.py
scratch2.py		scratch2.py
scratch_results.py		scratch_results.py
xgb_hist_runtimes.json		xgb_hist_runtimes.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis Plan

About

Releases

Packages

Languages

License

stfxecutables/model_variance

Folders and files

Latest commit

History

Repository files navigation

Analysis Plan

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages