Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Update LoggingHandler to support logging per interval #16922

Merged
merged 10 commits into from
Dec 7, 2019

Conversation

liuzh47
Copy link
Contributor

@liuzh47 liuzh47 commented Nov 27, 2019

Description

Add support of logging per interval to the default LoggingHandler. It acts as a middle ground between logging per batch and logging per epoch. Please refer to the description in Issue

Fixes #16918

@liuzh47 liuzh47 requested a review from szha as a code owner November 27, 2019 08:08
@leezu leezu requested a review from roywei November 28, 2019 02:50
@leezu leezu added the R1.6.0 label Nov 28, 2019
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a unit test for this?

python/mxnet/gluon/contrib/estimator/event_handler.py Outdated Show resolved Hide resolved
@liuzh47
Copy link
Contributor Author

liuzh47 commented Dec 2, 2019

Shall we add a unit test for this?

What assertion shall we make in this case?

@leezu
Copy link
Contributor

leezu commented Dec 2, 2019

You could assert the number of lines logged. For example, if you specify training for 10 batches and log every 5 batches, you may test for two lines being logged.

Copy link
Member

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Shall we remove LOG_PER_BATCH as it can be replaced by LOG_PER_INTERVAL?
Since this is in contrib, I think it's fine and less options will be easier for users.

@liuzh47
Copy link
Contributor Author

liuzh47 commented Dec 3, 2019

You could assert the number of lines logged. For example, if you specify training for 10 batches and log every 5 batches, you may test for two lines being logged.

Ok, I'll take a try.

@liuzh47
Copy link
Contributor Author

liuzh47 commented Dec 3, 2019

Thanks for the fix. Shall we remove LOG_PER_BATCH as it can be replaced by LOG_PER_INTERVAL?
Since this is in contrib, I think it's fine and less options will be easier for users.

Thanks for the suggestion.

I have thought about the option before. Most parts of logging batch and logging interval are overlapped. But during batch logging, we only logged the training metrics:

            for metric in self.train_metrics:
                # only log current training loss & metric after each batch
                name, value = metric.get()
                msg += '%s: %.4f, ' % (name, value)

But during interval logging, I think it is better to log both training and validation metrics. So LOG_PER_BATCH is still necessary?

          for monitor in self.train_metrics + self.val_metrics:
                name, value = monitor.get()
                msg += '%s: %.4f, ' % (name, value)

Or we can merge LOG_PER_BATCH and LOG_PER_INTERVAL, then we use self.log_interval to check whether using validation metrics or not.

@leezu
Copy link
Contributor

leezu commented Dec 3, 2019

I'm actually not sure if logging both self.train_metrics + self.val_metrics during LOG_PER_INTERVAL is good.
Currently the notion of val_metrics and train_metrics is not clearly decoupled. Thinking about how to tackle #16959 may also help to clarify the relation of eval_metrics and train_metrics and how their values should be updated and logged.

@liuzh47
Copy link
Contributor Author

liuzh47 commented Dec 3, 2019

I'm actually not sure if logging both self.train_metrics + self.val_metrics during LOG_PER_INTERVAL is good.
Currently the notion of val_metrics and train_metrics is not clearly decoupled. Thinking about how to tackle #16959 may also help to clarify the relation of eval_metrics and train_metrics and how their values should be updated and logged.

It makes sense to me. I'll merge LOG_PER_BATCH and LOG_PER_INTERVAL and leave out the val_metrics.

@leezu leezu merged commit f06fdf6 into apache:master Dec 7, 2019
ptrendx pushed a commit to ptrendx/mxnet that referenced this pull request Dec 9, 2019
…r batch interval (apache#16922)

* Update LoggingHandler to support logging per interval

* Fix the constant variable issue in the logging handler

* Remove the constant variable hack in the logging handler.

* 1) replace LOG_PER_BATCH with LOG_PER_INTERVAL 2) add test case

* Improve the test script for LoggingHandler

* small fix on the test script

* logging handler test case bug fix

* remove parameter verbose from LoggingHandler

* move log_interval to the first argument

* resolve unittest mistakes
ptrendx added a commit that referenced this pull request Dec 10, 2019
* Fix ndarray indexing bug (#16895)

* Fix indexing bug

* More test cases

* Add test from 16647

* [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (#16922)

* Update LoggingHandler to support logging per interval

* Fix the constant variable issue in the logging handler

* Remove the constant variable hack in the logging handler.

* 1) replace LOG_PER_BATCH with LOG_PER_INTERVAL 2) add test case

* Improve the test script for LoggingHandler

* small fix on the test script

* logging handler test case bug fix

* remove parameter verbose from LoggingHandler

* move log_interval to the first argument

* resolve unittest mistakes

* Add micro averaging strategy to pearsonr metric (#16878)

        Strategy to be used for aggregating across mini-batches.
            "macro": average the pearsonr scores for each batch.
            "micro": compute a single pearsonr score across all batches.

* [Bugfix] [Numpy] Add `kAddTo` and kNullOp to Transpose (#16979)

* update

Check for repeated axes

enable addto to transpose

fix

fix

fix

fix

remove unused ndim

Update pseudo2DTranspose_op-inl.cuh

Update pseudo2DTranspose_op-inl.cuh

Update pseudo2DTranspose_op-inl.cuh

fix

Update pseudo2DTranspose_op-inl.cuh

try to fix

Update pseudo2DTranspose_op-inl.cuh

Update pseudo2DTranspose_op-inl.cuh

Update pseudo2DTranspose_op-inl.cuh

fix

Update np_matrix_op.cc

Update test_numpy_op.py

update test case

fix implementation

fix bug

update

fix bug

Update pseudo2DTranspose_op-inl.cuh

fix

fix

Update test_numpy_op.py

* Fix bug

* fix docstring

* try to address comment

* no need to change this line

* Fix bug

* address comments

* address comment

* introduce  gradient update handler to the  base estimator (#16900)

* introduce  gradient update handler to the  base estimator

* Modify the gradient update handler to include the batch size

* Remove unrelated gradient update handler.

* Modify gradient update handler to take the current batch size.

* Remove white space to avoid the sanity check failure

* add small tweak to the handler code

* Modify the documentation of priority parameter of relevant handlers.

* small modification on the documentation.

* Add small modification on the documentation.

* Remove unnecessary list check
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable log interval for estimator
4 participants