-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pythia Checkpoint Loading #4
Comments
Current Solution: Number 3. Renames the weights from attention to attention.attn_block and mlp to mlp.attn_block, and stores the checkpoint again, and use the new checkpoint. We just need to run the convert checkpoint script and use that to load. Additionally, we set strict = False so that image prefix and adapters are ignored. I have checked manually if there are any other weights that exist that dont have the right name, but everything looks correct. Lastly, this requires another change in the DeeperSpeed code, use the following branch: https://github.com/EleutherAI/DeeperSpeed/tree/robin_summit |
We need to load Pythia Checkpoints for MAGMA training.
Main Issue: Mismatch in weights in checkpoint and in MAGMA model
Sources of mismatch
2.attention. query_key_value.weight to 2.attention.attn_block.query_key_value.weight
Proposed solutions:
Without changing names on Pythia Checkpoint:
Changing the names of the Pythia Checkpoint:
Mismatch Source 2:
2. Additional weights in MAGMA - Due to image prefix and adapters:
Proposed Solution: Can be resolved by setting strict = False when loading checkpoint. Not the best solution, can be risky. But plan is to quickly verify if all the weights that dont match are just due to image prefix and adapters and be able to train stuff, after First mismatch has been fixed, set strict=False. Can find a better solution once everyone is able to use the code to port their changes and do test runs.
The text was updated successfully, but these errors were encountered: