[MXNET-978] Support higher order gradient for `log`, `log2`, `log10`. #14992

kshitij12345 · 2019-05-18T13:26:14Z

Description

With reference to #14613, #10002 , this PR intends to add support for higher order gradient for log { and ideally for log2, log10 } as well.

Tests are based totally on #14613

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

higher order gradient for a log.
unit test for the same.

kshitij12345 · 2019-05-18T21:42:11Z

I don't know much about this library but,

I believe it would be better to have gradients defined for existing backward, instead of a differentiable gradient (relying on autograd machinery) at least on ops where backward is not trivial. It will allow to use existing optimised fused kernels and make sure there is no regression in the backward.

Note: log is relatively trivial (single reciprocal). But maybe we may see a performance regression for slightly non-trivial sigmoid, if we do it by relying on autograd machinery instead of the existing _backward_sigmoid.

kshitij12345 · 2019-05-18T21:45:39Z

Can anyone please point how I can have 1/log(2.0), 1/log(10.0) multiplied with gradient for log2, log10.

pinaraws · 2019-05-20T16:07:08Z

@mxnet-label-bot add[Operator, pr-awaiting-review]

apeforest · 2019-05-20T23:15:38Z

@kshitij12345 Thanks for your contribution. I agree with you it would be better to have gradients defined for existing backward operators.

I do not fully understand your question of 1/log(2.0), 1/log(10.0) multiplied with gradient for log2, log 10. Could you please elaborate?

kshitij12345 · 2019-05-21T07:04:25Z

I do not fully understand your question of 1/log(2.0), 1/log(10.0) multiplied with gradient for log2, log 10. Could you please elaborate?

Reading it again, I phrased it poorly. Sorry. So actually, plan was to update gradient for log2, which would be 1/(log(2.0) * x), for which I would have required a log(2.0). So how to get that, is scalar multiplication allowed ? or ones_like followed by fill.

Note: This is not needed for this PR. But curious to know.

Thank You.

kshitij12345 · 2019-05-23T09:56:24Z

@larroy @apeforest Have updated as per #14095 .
Please review.

kshitij12345 · 2019-05-23T13:44:07Z

Failure looks irrelevant to the PR.

https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-14992/7/pipeline#step-767-log-360

larroy · 2019-05-23T19:34:46Z

src/operator/tensor/elemwise_unary_op_basic.cc

+ [](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
+ // For g(x) -> g = log
+ // g''(x) = -1 * (g'(x) * g'(x))
+ auto gx = nnvm::NodeEntry{n, 0, 0};


You are welcome to simplify calls to NodeEntry as per:

#14095

Just call nnvm::NodeEntry{n}

Oh. Missed that. Thank You.

larroy

Nice PR, thanks a lot for this. just a couple of questions.

larroy · 2019-05-23T19:41:41Z

src/operator/tensor/elemwise_unary_op_basic.cc

+
+ ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
+ {ograds[0], gx}, nullptr, &n));
+ ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_inp",


Why are we returning two gradients, isn't it an unary function with just one input?

log is unary, but the _backward_log isn't.
https://github.com/apache/incubator-mxnet/blob/016556553ced0c9c9400efec2f08a3c39cbb00ba/src/operator/tensor/elemwise_unary_op_basic.cc#L1071

It takes as input the output gradient and input to log.
https://github.com/apache/incubator-mxnet/blob/016556553ced0c9c9400efec2f08a3c39cbb00ba/src/operator/tensor/elemwise_unary_op_basic.cc#L1045

larroy · 2019-05-23T19:42:02Z

src/operator/tensor/elemwise_unary_op_basic.cc

+
+ std::vector<nnvm::NodeEntry> ret;
+
+ ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",


same comment as above.

* simplify NodeEntry creation.

apeforest · 2019-05-23T22:59:46Z

I do not fully understand your question of 1/log(2.0), 1/log(10.0) multiplied with gradient for log2, log 10. Could you please elaborate?

Reading it again, I phrased it poorly. Sorry. So actually, plan was to update gradient for log2, which would be 1/(log(2.0) * x), for which I would have required a log(2.0). So how to get that, is scalar multiplication allowed ? or ones_like followed by fill.

Note: This is not needed for this PR. But curious to know.

Thank You.

After reviewing your code, I had a better understanding of what you meant. I think you can do an elemwise_mul operator with Op('log') and a vector filled with 2.0. There maybe other ways to optimize the graph representation, but I think this should actually work.

apeforest · 2019-05-24T16:30:54Z

src/operator/tensor/elemwise_unary_op_basic.cc

+ unary_bwd<mshadow_op::log_grad>)
+.set_attr<nnvm::FGradient>("FGradient",
+ [](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
+ // For g(x) -> g = log


nit: It's very nice to see a comment here. The g(x) is actually a function of x. It might be easily confused with the variable gx two lines below. Maybe use f(x) in the comment here?

Ah sure . That makes sense. Thank You.

apeforest · 2019-05-24T16:46:05Z

@kshitij12345 The CI failure in unix-GPU was due to a flaky test for TensorRT: #14978

The issue has been fixed by #15014. Please re-trigger CI again. Thanks!

* update comment to avoid confusion.

apeforest

LGTM! Thanks a lot for your contribution.

larroy · 2019-05-28T17:38:28Z

Lgtm

larroy · 2019-05-29T01:44:42Z

src/operator/tensor/elemwise_unary_op_basic.cc

+ // For f(x) -> f = log10
+ // f'(x) = 1 / (log(10) * x)
+ // f''(x) = -1 * (f'(x) * 1/x)
+ auto gx = nnvm::NodeEntry{n, 0, 0};


Why don't we follow the same pattern as in the natural logarithm?

For natural log,
we have with us in gradient function, gx i.e. 1/x as well as x.
Since, second derivative of log is -(gx * gx) = -1/(x^2). We use the pattern.

Considering log2 (similar case for log10)
we have with us, gx i.e. 1/(log(2) * x) as well as x.
Since second derivative is -1/(log(2) * x * x)
which we get in the code using negative(gx * reciprocal(x)), where gx=1/(log(2) * x.
Another way to get that will be negative(gx * gx * log(2.0)).

@larroy Thanks for pointing this, going through this again made me realise that there is a problem with the implementation of log.

kshitij12345 · 2019-05-31T18:08:24Z

@apeforest @larroy

https://github.com/kshitij12345/incubator-mxnet/blob/7b343d1fcde73b61322985580080333d9eee9e82/src/operator/tensor/elemwise_unary_op_basic.cc#L1077-L1079

We multiply gx * gx where gx = ograd * f'(x), getting ograd^2 * f'(x)^2, however we want only ograd * f'(x)^2 which can be achieved in a similar fashion to the implementation of _backward_log10/2.

I have validated the expected results.

from mxnet import nd, autograd
import numpy
import math
grad_grad_op = lambda x: (-1/x**2)

x = nd.random.normal(0,1,(3,3))
x.attach_grad()
with autograd.record():
  y = nd.log(x)
  y_grad = autograd.grad(y, x, head_grads= nd.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(nd.ones_like(y_grad) * 0.6)

numpy.testing.assert_allclose(x.grad.asnumpy() , ( grad_grad_op(x) * 0.5 * 0.6).asnumpy(), rtol=1e-7, atol=1e-7)

Which fails with current code.
Should make a new PR, or add commits in this PR itself?. Sorry for the trouble.

Have confirmed the behaviour with Pytorch as well.

import torch
import numpy
import math

grad_grad_op = lambda x: (-1/x**2)

x = torch.randn(2,3)
x.requires_grad = True

y = torch.log(x)
y_grad = torch.autograd.grad(y, x, grad_outputs=torch.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(torch.ones_like(y_grad) * 0.6)

numpy.testing.assert_allclose(x.grad.detach().numpy() , ( grad_grad_op(x) * 0.5 * 0.6).detach().numpy(), rtol=1e-7, atol=1e-7)

* add higher order gradient support for log, log10, log2 * add tests * address comments * simplify NodeEntry creation. * address comments * update comment to avoid confusion. add nano cross compile add nano dockerfile workaround build error workaround build error - attempt 2 workaround build error - attempt 3 add jetson nano instructions; java api for jetson fix ci side for jetson build revert cmake updates not needed fix website build error and opencv error for arm8 make executable workaround apt install issue update python setup; remove java setup removed unneeded changes add a gpu test for verification remove scala setup step for now get rid of apt-get update since it fails every time

apeforest · 2019-06-03T17:14:13Z

Hi @kshitij12345 sorry, we missed that. We should have reviewed it more carefully. Please submit another PR to fix this issue. I will also update #14613 accordingly. Thanks!

apeforest · 2019-06-03T21:10:30Z

src/operator/tensor/elemwise_unary_op_basic.cc

+ std::vector<nnvm::NodeEntry> ret;
+
+ ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",
+ {ograds[0], gx}, nullptr, &n));


Shouldn't this be {ograds[0], g_lx} instead? Isn't dL/dy_grad = d^2L/dx^2 * f'(x)?

Yes, it should. Thanks. I have updated this change and added relevant test in the new PR #15120 .
Actually I am having trouble exactly at this part as the grad value is not being updated. More info in #15120

kshitij12345 · 2019-06-04T04:52:55Z

Have created a new PR for the same. #15120 Please review

* add higher order gradient support for log, log10, log2 * add tests * address comments * simplify NodeEntry creation. * address comments * update comment to avoid confusion.

kshitij12345 force-pushed the log_higher_order_grad branch 2 times, most recently from 16e1815 to 49e59f0 Compare May 18, 2019 21:24

marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels May 20, 2019

larroy mentioned this pull request May 20, 2019

Simplify creation of NodeEntry instances and use emplace_back #14095

Merged

5 tasks

kshitij12345 added 2 commits May 23, 2019 15:21

add higher order gradient support for log, log10, log2

e3ce4ab

add tests

0165565

kshitij12345 force-pushed the log_higher_order_grad branch from 2c05d17 to 0165565 Compare May 23, 2019 09:54

larroy reviewed May 23, 2019

View reviewed changes

address comments

52dc5fa

* simplify NodeEntry creation.

apeforest reviewed May 24, 2019

View reviewed changes

address comments

7b343d1

* update comment to avoid confusion.

apeforest approved these changes May 24, 2019

View reviewed changes

larroy approved these changes May 28, 2019

View reviewed changes

apeforest merged commit 8a9dd72 into apache:master May 28, 2019

larroy reviewed May 29, 2019

View reviewed changes

apeforest mentioned this pull request May 30, 2019

[MXNET-978] Second order gradient support for some unary operators #14613

Merged

7 tasks

apeforest reviewed Jun 3, 2019

View reviewed changes

kshitij12345 changed the title ~~[MXNET-978] Support higher order gradient for log.~~ [MXNET-978] Support higher order gradient for log, log2, log10. Jul 13, 2019

kshitij12345 deleted the log_higher_order_grad branch July 13, 2019 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-978] Support higher order gradient for `log`, `log2`, `log10`. #14992

[MXNET-978] Support higher order gradient for `log`, `log2`, `log10`. #14992

kshitij12345 commented May 18, 2019 •

edited

Loading

kshitij12345 commented May 18, 2019 •

edited

Loading

kshitij12345 commented May 18, 2019

pinaraws commented May 20, 2019

apeforest commented May 20, 2019

kshitij12345 commented May 21, 2019 •

edited

Loading

kshitij12345 commented May 23, 2019

kshitij12345 commented May 23, 2019

larroy May 23, 2019

kshitij12345 May 23, 2019

larroy left a comment

larroy May 23, 2019

kshitij12345 May 23, 2019

larroy May 23, 2019

apeforest commented May 23, 2019 •

edited

Loading

apeforest May 24, 2019

kshitij12345 May 24, 2019

apeforest commented May 24, 2019

apeforest left a comment

larroy commented May 28, 2019

larroy May 29, 2019

kshitij12345 May 29, 2019

kshitij12345 May 31, 2019

kshitij12345 commented May 31, 2019 •

edited

Loading

apeforest commented Jun 3, 2019

apeforest Jun 3, 2019

kshitij12345 Jun 4, 2019

kshitij12345 commented Jun 4, 2019


		std::vector<nnvm::NodeEntry> ret;

		ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad",

[MXNET-978] Support higher order gradient for log, log2, log10. #14992

[MXNET-978] Support higher order gradient for log, log2, log10. #14992

Conversation

kshitij12345 commented May 18, 2019 • edited Loading

Description

Checklist

Essentials

Changes

kshitij12345 commented May 18, 2019 • edited Loading

kshitij12345 commented May 18, 2019

pinaraws commented May 20, 2019

apeforest commented May 20, 2019

kshitij12345 commented May 21, 2019 • edited Loading

kshitij12345 commented May 23, 2019

kshitij12345 commented May 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented May 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented May 24, 2019

apeforest left a comment

Choose a reason for hiding this comment

larroy commented May 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 commented May 31, 2019 • edited Loading

apeforest commented Jun 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 commented Jun 4, 2019

[MXNET-978] Support higher order gradient for `log`, `log2`, `log10`. #14992

[MXNET-978] Support higher order gradient for `log`, `log2`, `log10`. #14992

kshitij12345 commented May 18, 2019 •

edited

Loading

kshitij12345 commented May 18, 2019 •

edited

Loading

kshitij12345 commented May 21, 2019 •

edited

Loading

apeforest commented May 23, 2019 •

edited

Loading

kshitij12345 commented May 31, 2019 •

edited

Loading