-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: stack expects each tensor to be equal size #929
Comments
There are several strange things in your config file. Can tell us a bit more about what you’re trying to do? Specifically:
It seems plausible that we might have a data loader bug that only appears in excess of a MBS of 100 because we’ve probably never tested it that large. However our data loader is largely the same as Megatron-DeepSpeed’s… have you tried that code base to see if it works with a MBS of 100? |
This issue should have been fixed in #835 @cateto -- Following up on @StellaAthena's questions, please try setting |
@cateto hey, any updates on this? |
Solved it ! Thank you.
|
Describe the bug
when i train this model by config below (train_micro_batch_size_per_gpu=100),
raise runtime error.
but i try to set
train_micro_batch_size_per_gpu < 100
. it works.but i want to use full gpu memory..!
please let me know
Environment (please complete the following information):
Configs(Click)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: