Transductive Bounds for the Multi-class Majority Vote Classifier

This repository is devoted to implementation of the approach proposed in:
Vasilii Feofanov, Emilie Devijver and Massih-Reza Amini - Transductive Bounds for the Multi-class Majority Vote Classifier.
In Proceedings of the AAAI Conference on Artificial Intelligence 33, 3566-3573.

Multi-class Self-learning Algorithm (MSLA)

The multi-class semi-supervised framework is considered. The goal is to infer a model based on given few labeled examples and lots of unlabeled ones. The proposed algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the transductive bound proposed in the paper. The algorithm is based on a supervised approach that can be any classifier that outputs posteriors. In our implementation, we use the random forest approach.

Code

The algorithm is implemented in Python 3 and can be found at self_learning.py. Some functions are re-written in Cython to reduce runtime and are located at self_learning_cython.pyx. By default, the msla function makes use of Cython, which can be manually changed if there is no possibility to install Cython package.

Dependencies

To run succesfully the code, it requires:

Python 3
scikit-learn
NumPy
Pandas
Matplotlib (only for plotting)
Cython (optional)

Experiments

To validate our approach, we compare the MSLA algorithm with the following methods:

Purely supervised approach. It is a scikit-learn implementation of the Random Forest.
The Label Propagation by scikit-learn.
Transductive SVM extended to the multi-class case by the one-versus-all approach.
FSLA: the self-learning algorithm with a fixed threshold equal to 0.7.

1. Simple Test

This test can be executed to verify that the code works on a machine. It performs classification using all classifiers under consideration on the DNA dataset with one random split on labeled and ulabeled data. In a terminal, the code is executed in the following way:

python3 simple_test.py

In case of success, the basic information will be displayed as well as the following graph:

In addition, the folder with TSVM input and output for each class will be created.

2. Experiment Test

This test performs experiments with the setup described in the paper. 20 random splits on labeled and unlabeled parts are performed for a dataset. To run the test in a terminal you specify two arguments: name of a dataset, and the labeled/unlabeled split. For instance, for the dataset pendigits with the split 0.99, one types the following:

python3 experiment_test.py pendigits 0.99

The output will create several file in the output folder.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
new-module		new-module
plots		plots
slides		slides
svm_light		svm_light
LICENSE		LICENSE
README.md		README.md
aux_functions.py		aux_functions.py
experiment_test.py		experiment_test.py
self_learning.py		self_learning.py
self_learning_cython.pyx		self_learning_cython.pyx
simple_test.py		simple_test.py
tsvm.py		tsvm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transductive Bounds for the Multi-class Majority Vote Classifier

Multi-class Self-learning Algorithm (MSLA)

Code

Dependencies

Experiments

1. Simple Test

2. Experiment Test

About

Releases

Packages

Languages

License

vfeofanov/trans-bounds-maj-vote

Folders and files

Latest commit

History

Repository files navigation

Transductive Bounds for the Multi-class Majority Vote Classifier

Multi-class Self-learning Algorithm (MSLA)

Code

Dependencies

Experiments

1. Simple Test

2. Experiment Test

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages