Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to add replay to GPT-NeoX. I had implemented this for the paper Simple and Scalable Strategies to Continually Pre-train Large Language Models that shows simple ways to efficiently continue to pretrain by improving adaptation to new data while mitigating forgetting of previous data. Note that this PR can serve as a basis to add the ability to resume training from a certain index in a dataset, based on how I implemented this feature for replay datasets.
How to use
I tried to make the descriptions of the replay args informative enough to serve as documentation. An example of a config using replay is also provided in
tests/config/example_replay_config.yml
.Unsupported/untested features:
replay_label_data
arg that would specify the prefix to the idx and data path of replay label data, then generate the specific replay label data path from the prefix, and treat it in a similar way as the training data in the blockPending tests
Currently, the tests required are:
The tests can follow the procedure described in
tests/model/test_batch_replicability.py
. Tests 1 and 3 were passed with the Summit version of NeoX, but I'll need to run them again on the replay implementation based on the current main branch of NeoX. I'll probably need someone else to test that label data support (test 2) did not break as I'm unfamiliar with this feature of NeoX and am currently too busy to take that on.