PU Training, Positive and Unlabelled training tests

Check out the Jupyter notebook

What is PU Training?

Elkan et al claims and proves the probability that a sample is positive is dependent on the probability that a sample is labelled times a constant factor. Labelled means known positive, while unlabelled means could be a positive or negative.

In practice, we start with an imbalanced dataset with few positive samples. Then we take a subset of the positive, e.g. 70 % and assign them as positive/labelled, and sample equally as many of the remaining positive and negative sambles, assigning them as negative/unlabelled. Then train our probabilistic classifier. We repeat this process n times to build an ensemble, so we can cover most of our dataset. We then use the ensemble to predict on our test set.

Nb: Since the constant factor P(s=1|y=1) does not affect our AUC calculation, it can be disregarded if the threshold to set is known. Otherwise it can be set as c/2, as per [2].

Notebook findings

Testing effects of positive and unlabelled training variables, inspired by [1] and [2]

Findings w/ PU training:

Performance is largely the same training on ca 5 % to 100 % of positive examples
PU training relies on relabelling unused positives as negatives.
If training an ensemble, performance for different positive to negative ratio is best at 1:1 to 1:2
If training an ensemble, training on all negatives leads to unstable performance when using n % of positives
Training an ensemble stabilizes performance towards a mean

Sources

[1] Semi-Supervised Classification of Unlabeled Data (PU Learning) https://towardsdatascience.com/semi-supervised-classification-of-unlabeled-data-pu-learning-81f96e96f7cb
[2] Paper Elkan et al 2008: Learning classifiers from only positive and unlabeled data https://dl.acm.org/doi/10.1145/1401890.1401920

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
PU_Training.ipynb		PU_Training.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PU Training, Positive and Unlabelled training tests

Check out the Jupyter notebook

What is PU Training?

Notebook findings

Sources

About

Releases

Packages

Languages

Magnushhoie/PU_Training

Folders and files

Latest commit

History

Repository files navigation

PU Training, Positive and Unlabelled training tests

Check out the Jupyter notebook

What is PU Training?

Notebook findings

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages