H-ensemble

Codes and supports for AAAI-24 H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer

ArXiv link (with Appendix): https://arxiv.org/abs/2312.12489

Abstract

Multi-source transfer learning is an effective solution to data scarcity by utilizing multiple source tasks for the learning of the target task. However, access to source data and model details is limited in the era of commercial models, giving rise to the setting of multi-source-free (MSF) transfer learning that aims to leverage source domain knowledge without such access. As a newly defined problem paradigm, MSF transfer learning remains largely underexplored and not clearly formulated. In this work, we adopt an information theoretic perspective on it and propose a framework named H-ensemble, which dynamically learns the optimal linear combination, or ensemble, of source models for the target task, using a generalization of maximal correlation regression. The ensemble weights are optimized by maximizing an information theoretic metric for transferability. Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel and realistic MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt. Our method is empirically validated by ablation studies, along with extensive comparative analysis with other task ensemble and transfer learning methods. We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.

Requirements

Pytorch&Torchvision and other dependencies can be installed using the following command:

pip install hydra-core tqdm pytorch-ignite
pip install opencv-python numpy pandas scipy scikit-learn

Usage

We train source models using the SHOT Repo, i.e., "train_source.py" and "data_list.py". For SHOT splits a network into several parts, we add a flag if_use_shot_model for our script to load it correctly. See example command in "run.sh".

To build up the task-split datasets, use the _dataset/readfile.py first to convert the original datasets that are in ImageFolder type to a image-label annotation. Then use _dataset/tasksplit.py to perform actually task splitting operation ( manual dir renaming and other step must be done due to incomplete automation 😣 ) We ship an example dataset folder with these scripts.

Modify config files under ./conf to specify dataset, path, etc..

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SHOT_for_source_training		SHOT_for_source_training
_dataset		_dataset
conf		conf
dataloader		dataloader
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
run.sh		run.sh
test_HEnsemble.py		test_HEnsemble.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

H-ensemble

Abstract

Requirements

Usage

About

Releases

Packages

Contributors 2

Languages

License

viki760/H-ensemble

Folders and files

Latest commit

History

Repository files navigation

H-ensemble

Abstract

Requirements

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages