Scaling law often cause nan training loss when extend grids. #314

plyu3 · 2024-07-10T03:20:48Z

I am trying to use the scaling law experiments and use extend grids as the grid number goes along [5, 10, 20, 50, 100]. The model trained well at the start when grid=5, but it gets nan value when I extend the grid size to 10, 20 after training grid=5.

KindXiaoming · 2024-07-10T21:52:59Z

does the dataset have any weird statistics?

KindXiaoming · 2024-07-14T02:53:38Z

I did observe nan sometimes, but there could be many reasons for it: data could have singularities, could be degenerating along some dimensions, LBFGS optimization can go wrong with large grids. I'm happy to provide more guided guess if you reopen the issue and provide more details.

KindXiaoming closed this as completed Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling law often cause nan training loss when extend grids. #314

Scaling law often cause nan training loss when extend grids. #314

plyu3 commented Jul 10, 2024

KindXiaoming commented Jul 10, 2024

KindXiaoming commented Jul 14, 2024 •

edited

Loading

Scaling law often cause nan training loss when extend grids. #314

Scaling law often cause nan training loss when extend grids. #314

Comments

plyu3 commented Jul 10, 2024

KindXiaoming commented Jul 10, 2024

KindXiaoming commented Jul 14, 2024 • edited Loading

KindXiaoming commented Jul 14, 2024 •

edited

Loading