GrokFast: Accelerated Grokking by Amplifying Slow Gradients #244

redknightlois · 2024-06-21T15:32:35Z

GrokFast: Accelerated Grokking by Amplifying Slow Gradients

One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than ×50 with only a few lines of code that amplifies the slow-varying components of gradients. The experiments show that our algorithm applies to diverse tasks involving images, languages, and graphs, enabling practical availability of this peculiar artifact of sudden generalization

Twitter thread: https://x.com/_ironjr_/status/1798733867303772607
Paper: https://arxiv.org/abs/2405.20233
Code:

https://github.com/ironjr/grokfast (as an add-on)
https://github.com/lucidrains/grokfast-pytorch (integrated in the optimizer)

redknightlois added the feature request Request features label Jun 21, 2024

redknightlois assigned kozistr Jun 21, 2024

kozistr mentioned this issue Jun 22, 2024

[Feature] Implement GrokFast optimizer #245

Merged

1 task

kozistr closed this as completed in #245 Jun 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GrokFast: Accelerated Grokking by Amplifying Slow Gradients #244

GrokFast: Accelerated Grokking by Amplifying Slow Gradients #244

redknightlois commented Jun 21, 2024

GrokFast: Accelerated Grokking by Amplifying Slow Gradients #244

GrokFast: Accelerated Grokking by Amplifying Slow Gradients #244

Comments

redknightlois commented Jun 21, 2024