-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-label Cross Validation #535
Comments
Thanks @thomaslow ! There is a method for cross-validation in the Maui Server REST API:
I'm not sure how Maui Server splits the data - is it done intelligently, trying to ensure that rare labels are evenly split, or just randomly.
I see your point. Annif isn't primarily a Python library, though - all functionality is provided either via CLI or REST API (or both). But of course it can be used as a Python module and ideally this functionality would also be available as a module with a reasonable API.
Thanks for the tip! |
Responding to myself: it seems that Maui Server doesn't do anything particularly intelligent: it just splits the corpus into equal size batches without looking at the distribution of labels. |
I cannot contribute a definitve answer here, but I have some input, that might help direct the discussion. I found the following paper quite valuable: As for cross validation, I personally wouldn't opt for that. With these extreme multi-label problems (that not all annif users may have), I fear that for large vocabularies and models even a single run with moderate computing power can take a while. Looping that through a CV pipeline would be out of computanional scope for the resources that I can access (of course, other users might have better resources). |
Thank you for the references to interesting papers papers @mfakaehler ! Currently the splitting of data sets is always performed outside Annif. I think it could be useful to provide tools that help perform the split in some reasonable way, and the method in the first paper looks particularly useful. Although I am unsure how difficult it would be to implement that (for example as a small python script under a directory such as The propensity scored measures also look useful - in particular the propensity scored nDCG variant could be more informative than standard nDCG. OTOH, it doesn't seem to be a widely used metric beyond the original paper. I encourage anyone who feels the need for this kind of metric to open a new issue here.
That's a fair point, and certainly CV can increase the amount of computation by an order of magnitude. OTOH, it can also be useful in scenarios with small data sets even if it's not always practical. |
Currently, the
eval
command allows to evaluate the predictive performance of a backend for a single training and test split. Unfortunately, splitting multi-class multi-label data into training and test sets is not trivial, especially in case there are only few examples for some classes. Also, relying on the same training and test split when testing various backends and model parameters can lead to overfitting.Implementing a multi-label cross validation method would help to evaluate various classification approaches.
Yes, a CLI command like this would be useful. Personally, as a developer, I would prefer a Python module API that allows to put together custom pilelines, similar to sklearn.pipeline.Pipeline.
There is an implementation for cross validation of multi-label data, which seems promising, but I haven't had a chance to test, see scikit.ml or code.
The text was updated successfully, but these errors were encountered: