Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
multilabel_oversampling		multilabel_oversampling
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Multilabel Oversampling

Many algorithms for imbalanced data support binary and multiclass classification only. This approach is made for mulit-label classification (aka multi-target classification).

🎰 Algorithm

Multilabel dataset (as pandas.DataFrame) with imbalanced data
Calculate counts per class and then calculate the standard deviation (std) of the count values
Do for number_of_adds times the following:
Randomly draw a sample from your data and calculate new std
- If new std reduces, add sample to your dataset
- If not, draw another sample (to this up to number_of_tries times)
A new df is returned.
A result plot viszualize the target distribition before and after upsampling. Moreover the counts per index are shown.

➡️ Usage

from multilabel_oversampling import multilabel_oversampling as mo

df = mo.create_fake_data(size=1, seed=3)
ml_oversampler = mo.MultilabelOversampler(number_of_adds=100, number_of_tries=100)
df_new, plot_at = ml_oversampler.fit(df)
#> Iteration:  20%|██████                        | 20/100 [00:00<00:00, 111.68it/s]
#> No improvement after 100 tries in iter 20.

ml_oversampler.plot_results()

ℹ️ Install

Install from GitHub

pip install git+https://github.com/phiyodr/multilabel-oversampling

🌻

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilabel Oversampling

🎰 Algorithm

➡️ Usage

ℹ️ Install

About

Releases

Packages

Languages

License

phiyodr/multilabel-oversampling

Folders and files

Latest commit

History

Repository files navigation

Multilabel Oversampling

🎰 Algorithm

➡️ Usage

ℹ️ Install

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages