-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling law often cause nan training loss when extend grids. #314
Comments
does the dataset have any weird statistics? |
I did observe nan sometimes, but there could be many reasons for it: data could have singularities, could be degenerating along some dimensions, LBFGS optimization can go wrong with large grids. I'm happy to provide more guided guess if you reopen the issue and provide more details. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to use the scaling law experiments and use extend grids as the grid number goes along [5, 10, 20, 50, 100]. The model trained well at the start when grid=5, but it gets nan value when I extend the grid size to 10, 20 after training grid=5.
The text was updated successfully, but these errors were encountered: