[Fit API] improve event handlers #14685

roywei · 2019-04-12T18:07:23Z

Description

Making the follwing on evetn handlers based on the design here:
https://cwiki.apache.org/confluence/display/MXNET/Callback+Design+for+Fit+Loop

Making metric update and validation logic in event handlers
Each event handler maintain it's own states
Passing a weak reference of estimator at each callback call, so some attributes are passed(net, trainer, etc)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

python/mxnet/gluon/contrib/estimator/estimator.py

nswamy

Thanks for patiently accommodating the last minute design change requests. I have a few comments would like you to know what you think and create a follow up PR if necessary.

nswamy · 2019-04-19T19:10:40Z

python/mxnet/gluon/contrib/estimator/estimator.py

- losses = []
- for loss in self.loss:
- losses.append([loss(y_hat, y) for y_hat, y in zip(pred, label)])
+ loss = [self.loss[0](y_hat, y) for y_hat, y in zip(pred, label)]


what if the model had multiple loss functions?

multi loss will be supported in #14628, let's get the first version into master and iterate on that.

nswamy · 2019-04-19T19:23:09Z

python/mxnet/gluon/contrib/estimator/estimator.py

+ val_metrics=val_metrics))
+ event_handlers.append(LoggingHandler(train_metrics=train_metrics,
+ val_metrics=val_metrics))
+ warnings.warn("No Event Handler specified, default %s are used. "


can you write this warning using the LoggingHandler's logger? so the user has one place to control the log levels and look for.

Good point! for now we can only do this for estimator and handlers, any other warning from mxnet and gluon still can't be controlled. tracked here: https://issues.apache.org/jira/browse/MXNET-1395

nswamy · 2019-04-19T19:26:10Z

python/mxnet/gluon/contrib/estimator/estimator.py

- losses = []
- for loss in self.loss:
- losses.append([loss(y_hat, y) for y_hat, y in zip(pred, label)])
+ loss = [self.loss[0](y_hat, y) for y_hat, y in zip(pred, label)]


same thing, using only a single loss?

as above

multi loss will be supported in #14628, let's get the first version into master and iterate on that.

nswamy · 2019-04-19T19:41:27Z