-
Notifications
You must be signed in to change notification settings - Fork 977
Issues: EleutherAI/gpt-neox
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Fine-tuning 20B model doesn't seem to work
bug
Something isn't working
deprioritized
Issues that are not closed, but are low priority and unlikely to be solved soon
#767
opened Jan 10, 2023 by
abar-75
Add support for sequence parallelism
feature request
New feature or request
help wanted
This issue needs assistance
#812
opened Mar 7, 2023 by
Quentin-Anthony
The plot got from muP coord_check seems not horizontal, which may indicates there exits a bug in the muP implementation?
bug
Something isn't working
#956
opened May 28, 2023 by
BaoYu0721
Finetuning loss explode when not loading deepspeed zero optimal states
bug
Something isn't working
#843
opened Mar 19, 2023 by
sxthunder
RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight
bug
Something isn't working
good first issue
Good for newcomers
help wanted
This issue needs assistance
#645
opened Jul 7, 2022 by
mcao516
Investigate DeepSpeed Inference
feature request
New feature or request
good first issue
Good for newcomers
#845
opened Mar 21, 2023 by
Quentin-Anthony
Migrate tensor parallelism code to use OSLO
feature request
New feature or request
oslo
issues relating to refactoring NeoX to use OSLO
#578
opened Mar 1, 2022 by
sdtblck
3 tasks
Integrate TransformerEngine
feature request
New feature or request
#1098
opened Dec 21, 2023 by
Quentin-Anthony
Fine-tuning gpt-neox on 8 A100s
feature request
New feature or request
#892
opened Apr 20, 2023 by
rajhans
Introduce improvements from OSLO
feature request
New feature or request
#571
opened Feb 23, 2022 by
hyunwoongko
[BUG] Inconsistent loss between Something isn't working
overlap_comm=true
and overlap_comm=false
bug
#1004
opened Jul 27, 2023 by
0x6b64
Unable to load model checkpoint with model parallelism
feature request
New feature or request
#773
opened Jan 20, 2023 by
RaoNikitha
Hosted Github Runners for CI
feature request
New feature or request
#531
opened Feb 9, 2022 by
Mistobaan
2 tasks
[BUG?] Higher "gradient_accumulation_steps" still increases memory usage a lot
bug
Something isn't working
#1123
opened Jan 15, 2024 by
exnx
AssertionError: zero stage 1 requires an optimizer
bug
Something isn't working
good first issue
Good for newcomers
help wanted
This issue needs assistance
#987
opened Jul 4, 2023 by
yonglianglan
How to preserve Pythia's sampling order but for different batch size.
bug
Something isn't working
#984
opened Jul 3, 2023 by
lintangsutawika
20B pretrained model inference OOM on 8xA100 40GB
bug
Something isn't working
good first issue
Good for newcomers
#901
opened Apr 23, 2023 by
Mutinifni
Add StableLM as an example to the README
documentation
Improvements or additions to documentation
#896
opened Apr 22, 2023 by
StellaAthena
block-sparse flash attention support
feature request
New feature or request
good first issue
Good for newcomers
#851
opened Mar 22, 2023 by
jordiclive
The results of running eval show only 1 digit after decimal point for acc on all tested tasks
bug
Something isn't working
#1227
opened May 22, 2024 by
lernerjenny
My servers used for multi-node training do not have ssh. How can I launch multi-node training using the torchrun command?
feature request
New feature or request
#1203
opened Apr 23, 2024 by
dingning97
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.