[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673

fierceX · 2019-04-11T07:43:15Z

Description

fix ELU function will appear nan when calculating the gradient

e.g:

elu = nn.ELU()
elu.initialize()
x = nd.ones((2, 3))
x.attach_grad()
x[0] = 100
x[1] = -1
with autograd.record():
    y = elu(x)

y.backward()
print(y)
print(x.grad)

output is

[[100.         100.         100.        ]
 [ -0.63212055  -0.63212055  -0.63212055]]
<NDArray 2x3 @cpu(0)>

[[       nan        nan        nan]
 [0.36787945 0.36787945 0.36787945]]
<NDArray 2x3 @cpu(0)>

After experiments, it was found that the where and the exp were used together and that this bug would occur

e.g:

x = nd.ones((2, 3))
x.attach_grad()
x[0] = 100
x[1] = 5

with autograd.record():
    y = nd.where(x > 0, x, nd.exp(x))
y.backward()
print(y)
print(x.grad)

output:

[[100. 100. 100.]
 [  5.   5.   5.]]
<NDArray 2x3 @cpu(0)>

[[nan nan nan]
 [ 1.  1.  1.]]
<NDArray 2x3 @cpu(0)>

When the exp calculation appears inf, even if where does not select the value, but the gradient will still be nan,so this PR does not completely fix the problem, but based on this, modified the ELU calculation method so that it does not appear nan

haojin2 · 2019-04-12T03:45:26Z

python/mxnet/gluon/nn/activations.py

@@ -158,7 +158,8 @@ def __init__(self, alpha=1.0, **kwargs):
 self._alpha = alpha

 def hybrid_forward(self, F, x):
- return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0))
+ _x = F.where(x < 0, x, F.zeros_like(x))
+ return F.where(x > 0, x, self._alpha * (F.exp(_x) - 1.0))


Thanks for your contribution!
Actually we do have ELU implemented in the backend so you can call it directly
Please use:

return F.LeakyReLU(x, act_type='elu', slope=self._alpha)

here instead and also you want to change: https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_gluon.py#L1183 to:

return mx.nd.expm1(x) if x <= 0.0 else x

so that the test would still pass after this change.

This bug does not affect the forward calculation, but occurs when the gradient is calculated backwards.

@fierceX Yes I understand, and I'm presenting the recommended way to fix this issue to you in the above comment.

haojin2 · 2019-04-17T18:20:09Z

@fierceX You also need to change tests/python/unittest/test_gluon.py Line 1183 to:

return mx.nd.expm1(x) if x <= 0.0 else x

so that the test could pass, thanks!

haojin2 · 2019-04-18T07:40:27Z

python/mxnet/gluon/nn/activations.py

@@ -158,7 +158,7 @@ def __init__(self, alpha=1.0, **kwargs):
 self._alpha = alpha

 def hybrid_forward(self, F, x):
- return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0))
+ F.LeakyReLU(x, act_type='elu', slope=self._alpha)


Seems like you have an extra whitespace here, please get rid of that and this is good to go.

Sorry, less return.

haojin2 · 2019-04-18T07:45:47Z

python/mxnet/gluon/nn/activations.py

@@ -158,7 +158,7 @@ def __init__(self, alpha=1.0, **kwargs):
 self._alpha = alpha

 def hybrid_forward(self, F, x):
- return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0))
+  return F.LeakyReLU(x, act_type='elu', slope=self._alpha)


Nit: The extra space before return

haojin2

LGTM. Will merge once the test passes.

fierceX · 2019-04-18T07:50:40Z

Note: There should still be a bug in the where operator.

haojin2 · 2019-04-18T07:52:22Z

@fierceX We'll surely investigate that.

haojin2 · 2019-04-19T18:37:08Z

@fierceX Can you rebase and do a force push?

haojin2 · 2019-04-23T05:58:54Z

@fierceX Merged, thanks for your contribution!

…nt (apache#14673) * fix ELU * fix * fix * fix * fix * fix

fix ELU

bee7e96

fierceX requested a review from szha as a code owner April 11, 2019 07:43

haojin2 suggested changes Apr 12, 2019

View reviewed changes

haojin2 added the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Apr 12, 2019

fix

5f3dd27

fix

dd52309

haojin2 suggested changes Apr 18, 2019

View reviewed changes

fix

fcc7caf

haojin2 reviewed Apr 18, 2019

View reviewed changes

fix

fd21ad0

haojin2 approved these changes Apr 18, 2019

View reviewed changes

fix

faa8a3f

haojin2 merged commit 494c29e into apache:master Apr 23, 2019

fierceX mentioned this pull request Jun 10, 2019

[SCRIPT] Add ESIM for text matching dmlc/gluon-nlp#689

Merged

6 tasks

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

[BUGFIX] fix ELU function will appear nan when calculating the gradie…

a0b57b0

…nt (apache#14673) * fix ELU * fix * fix * fix * fix * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673

[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673

fierceX commented Apr 11, 2019

haojin2 Apr 12, 2019

fierceX Apr 12, 2019

haojin2 Apr 12, 2019

haojin2 commented Apr 17, 2019

haojin2 Apr 18, 2019

fierceX Apr 18, 2019

haojin2 Apr 18, 2019

haojin2 left a comment

fierceX commented Apr 18, 2019

haojin2 commented Apr 18, 2019

haojin2 commented Apr 19, 2019

haojin2 commented Apr 23, 2019

[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673

[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673

Conversation

fierceX commented Apr 11, 2019

Description

haojin2 Apr 12, 2019

Choose a reason for hiding this comment

fierceX Apr 12, 2019

Choose a reason for hiding this comment

haojin2 Apr 12, 2019

Choose a reason for hiding this comment

haojin2 commented Apr 17, 2019

haojin2 Apr 18, 2019

Choose a reason for hiding this comment

fierceX Apr 18, 2019

Choose a reason for hiding this comment

haojin2 Apr 18, 2019

Choose a reason for hiding this comment

haojin2 left a comment

Choose a reason for hiding this comment

fierceX commented Apr 18, 2019

haojin2 commented Apr 18, 2019

haojin2 commented Apr 19, 2019

haojin2 commented Apr 23, 2019