nll_loss with weights: reduction 'mean' gives wrong result #31295

JoveIC · 2019-12-15T03:40:14Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

torch.manual_seed(42)
torch.cuda.manual_seed(42)

on cpu

>>> i = torch.randn(3, 5, requires_grad=True, device='cpu')
tensor([[ 0.3367,  0.1288,  0.2345,  0.2303, -1.1229],
        [-0.1863,  2.2082, -0.6380,  0.4617,  0.2674],
        [ 0.5349,  0.8094,  1.1103, -1.6898, -0.9890]], requires_grad=True)

>>> w = torch.randn(5, device='cpu')
tensor([ 0.9580,  1.3221,  0.8172, -0.7658, -0.7506])

>>> target = torch.tensor([1, 0, 4], device='cpu')
tensor([1, 0, 4])

>>> reduction = ['none', 'sum', 'mean']
>>> for r in reduction:
...        m = F.log_softmax(i, dim=1)
...        loss = F.nll_loss(m, target, w, reduction=r)

none :  tensor([ 2.0560,  2.6612, -2.2593], grad_fn=<NllLossBackward>)
sum :  tensor(2.4579, grad_fn=<NllLossBackward>)
mean :  tensor(1.6070, grad_fn=<NllLossBackward>)

on gpu

>>> i = torch.randn(3, 5, requires_grad=True, device='cuda:0')
tensor([[ 0.1940,  2.1614, -0.1721,  0.8491, -1.9244],
        [ 0.6530, -0.6494, -0.8175,  0.5280, -1.2753],
        [-1.6621, -0.3033, -0.0926,  0.1992, -1.1204]], device='cuda:0',
       requires_grad=True)

>>> w = torch.randn(5, device='cuda:0')
tensor([ 0.1391, -0.1082, -0.7174,  0.7566,  0.3715], device='cuda:0')

> >> target = torch.tensor([1, 0, 4], device='cuda:0')`
tensor([1, 0, 4], device='cuda:0')

>>> reduction = ['none', 'sum', 'mean']
>>> for r in reduction:
...        m = F.log_softmax(i, dim=1)
...        loss = F.nll_loss(m, target, w, reduction=r)

none :  tensor([-0.0455,  0.1291,  0.8693], device='cuda:0', grad_fn=<NllLossBackward>)
sum :  tensor(0.9530, device='cuda:0', grad_fn=<NllLossBackward>)
mean :  tensor(2.3681, device='cuda:0', grad_fn=<NllLossBackward>)

Expected behavior

>> loss = F.nll_loss(m, target, w, reduction='none')
>> loss.mean()

mean : tensor(0.8193, grad_fn=<MeanBackward0>)

Environment

PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.0.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: TITAN X (Pascal)
GPU 1: TITAN X (Pascal)

Nvidia driver version: 430.50
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.13.3
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.14 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] pytorch 1.3.1 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchvision 0.4.2 py36_cu101 pytorch

cc @ezyang @gchanan @zou3519

The text was updated successfully, but these errors were encountered:

zou3519 · 2019-12-16T22:28:57Z

Seems bad if there is a correctness issue. I haven't checked the code, so we should first verify that the behavior is indeed correct. nll_loss is an important loss function.

gchanan · 2019-12-17T21:16:57Z

This isn't actually a bug -- @anjali411 is going to post why.

anjali411 · 2019-12-17T21:31:46Z

This is not a bug. The loss in case of mean is calculated by the following formula mentioned in the documentation https://pytorch.org/docs/stable/nn.html#nllloss
According to the formula:
For CPU example:
when reduction = mean, loss = Σ ln/(w1+w0+w4) for n=1 to 3 where li's are the elements in the loss tensor for reduction = none
Thus loss = (2.0560 + 2.6612 + (-2.2593))/(0.9580+1.3221-0.7506) = 1.6070

For GPU example:

when reduction = mean, loss = Σ ln/(w1+w0+w4) for n=1 to 3
= (-0.0455 + 0.1291 + 0.8693)/ (0.1391 -0.1082 + 0.3715) = 2.3681

Clarification for documentation yn = nth element in the target tensor

JoveIC · 2019-12-18T00:41:30Z

@anjali411 Thank you for the explanation

…#31488) Summary: Reference: #31385 In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(#31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation. Pull Request resolved: #31488 Differential Revision: D19181391 Pulled By: anjali411 fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74

…pytorch#31488) Summary: Reference: pytorch#31385 In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(pytorch#31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation. Pull Request resolved: pytorch#31488 Differential Revision: D19181391 Pulled By: anjali411 fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74

JoveIC changed the title ~~Cross entropy loss with weights: reduction 'mean' gives wrong result~~ nll_loss with weights: reduction 'mean' gives wrong result Dec 15, 2019

zou3519 added high priority module: operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 16, 2019

pytorch-probot bot added the triage review label Dec 16, 2019

anjali411 self-assigned this Dec 17, 2019

anjali411 closed this as completed Dec 17, 2019

anjali411 mentioned this issue Dec 19, 2019

[Doc Update] Updated documentation for NLLLoss to explain what x, y and w refer to #31488

Closed

gchanan mentioned this issue Jun 25, 2020

'mean' reduction result in CrossEntropyLoss mismatches with manually computing mean #40560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nll_loss with weights: reduction 'mean' gives wrong result #31295

nll_loss with weights: reduction 'mean' gives wrong result #31295

JoveIC commented Dec 15, 2019 •

edited by gchanan

Loading

zou3519 commented Dec 16, 2019

gchanan commented Dec 17, 2019

anjali411 commented Dec 17, 2019

JoveIC commented Dec 18, 2019

nll_loss with weights: reduction 'mean' gives wrong result #31295

nll_loss with weights: reduction 'mean' gives wrong result #31295

Comments

JoveIC commented Dec 15, 2019 • edited by gchanan Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

zou3519 commented Dec 16, 2019

gchanan commented Dec 17, 2019

anjali411 commented Dec 17, 2019

JoveIC commented Dec 18, 2019

JoveIC commented Dec 15, 2019 •

edited by gchanan

Loading