Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss become Nan after some time training #35

Open
DrokBing opened this issue Nov 18, 2023 · 8 comments
Open

Loss become Nan after some time training #35

DrokBing opened this issue Nov 18, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@DrokBing
Copy link

Screenshot from 2023-11-18 11-32-03
And the rendering image is white background

@guanjunwu
Copy link
Collaborator

guanjunwu commented Nov 21, 2023

Wow... I also found same problem during optimization. Initialiy I think it's error on my training machine.
Most of cases happen on scenes have more background points such as flame_salmon_1 and coffee_martini of the Neu3D datasets. I think it may be the nemerical overflow during training. Do you have any ideas?
I hope we can solve it together if you have time :)

@guanjunwu guanjunwu added the bug Something isn't working label Nov 23, 2023
@Arisilin
Copy link

I also encountered this problem when training on my own scene, the loss may become nan after several iterations in fine stage. Besides, there are also some cases that "Runtime Error: numel: integer multiplication overflow" happens during fine stage training. I am not sure if it is caused by similar reason.

@leo-frank
Copy link

I meet the same problem, on a colmap-format dataset.

image

The PSNR suddenly decrease into an unexpected value(4.28), while the number of point cloud also decreases .

@guanjunwu
Copy link
Collaborator

I guess that maybe the scene's bounding box is so large, and causes the error when producing the backpropagation in the Gaussian deformation field network.

@GotFusion
Copy link

I guess that maybe the scene's bounding box is so large, and causes the error when producing the backpropagation in the Gaussian deformation field network.

Is there any solution to solve this problem?

@guanjunwu
Copy link
Collaborator

In my test, set no_dr=True and no_ds=True (disable the deformation of rotation and scaling) will decrease the happening of the problem.

@zhaohaoyu376
Copy link

In my test, set no_dr=True and no_ds=True (disable the deformation of rotation and scaling) will decrease the happening of the problem.

However, it seems that performance might be significantly affected by this approach. Are there any other solutions?

@zhaohaoyu376
Copy link

Why do I always run again because the loss is nan? I can't even finish running it once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants