This repo inculdes the offical code in the paper Critical Data Size of Language Models from a Grokking Perspective.
torch
>= 2.0transformers
Execute the following command to re-produce our results:
sh run_grokking_on_imdb.sh
sh run_grokking_on_yelp.sh
@article{zhu2024critical,
title={Critical data size of language models from a grokking perspective},
author={Zhu, Xuekai and Fu, Yao and Zhou, Bowen and Lin, Zhouhan},
journal={arXiv preprint arXiv:2401.10463},
year={2024}
}