Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling law often cause nan training loss when extend grids. #314

Closed
plyu3 opened this issue Jul 10, 2024 · 2 comments
Closed

Scaling law often cause nan training loss when extend grids. #314

plyu3 opened this issue Jul 10, 2024 · 2 comments

Comments

@plyu3
Copy link

plyu3 commented Jul 10, 2024

I am trying to use the scaling law experiments and use extend grids as the grid number goes along [5, 10, 20, 50, 100]. The model trained well at the start when grid=5, but it gets nan value when I extend the grid size to 10, 20 after training grid=5.

@KindXiaoming
Copy link
Owner

does the dataset have any weird statistics?

@KindXiaoming
Copy link
Owner

KindXiaoming commented Jul 14, 2024

I did observe nan sometimes, but there could be many reasons for it: data could have singularities, could be degenerating along some dimensions, LBFGS optimization can go wrong with large grids. I'm happy to provide more guided guess if you reopen the issue and provide more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants