Skip to content

Pull requests: EleutherAI/gpt-neox

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Disable row-parallelism for now
#915 by Quentin-Anthony was merged May 2, 2023 Loading…
Turn remaining hyphens into underscores in merge20b.py good first issue Good for newcomers
#986 by wunderalbert was merged Aug 12, 2023 Loading…
Draft PR Adding mistral 0.1
#1131 by AIproj was merged Feb 23, 2024 Loading…
Docker build action test
#105 by joshlk was closed Jan 31, 2021 Loading…
Delete old checkpoints
#159 by sdtblck was merged Mar 5, 2021 Loading…
ZeRO-3 Goes Brrrrr
#199 by StellaAthena was closed May 9, 2021 Loading…
Rewrite data download and tokenization pipeline
#224 by leogao2 was merged Apr 10, 2021 Loading…
Refactor args
#249 by sweinbach was merged Apr 26, 2021 Loading…
Run eval harness during training
#367 by sdtblck was merged Aug 31, 2021 Loading…
Updating default configs to be less bad
#665 by StellaAthena was merged Nov 18, 2022 Loading…
Add DeepSpeed bf16 configuration
#787 by dashstander was merged May 16, 2023 Loading…
fix rope precision bug
#1016 by fecet was closed Sep 27, 2023 Loading…
fix python version and pytest install
#1234 by jahatef was merged Jun 19, 2024 Loading…
Adding jsonl chunked dataset
#52 by glebshevchukk was closed Jan 23, 2021 Loading…
Added MPU from Sid's MegatronPipeline
#88 by glebshevchukk was closed Jan 28, 2021 Loading…
Better deployment
#107 by joshlk was merged Feb 1, 2021 Loading…
Update Dockerfile
#448 by sdtblck was closed Feb 21, 2022 Loading…
add log_grad_pct_zeros to neox_args
#477 by CoEich was merged Feb 24, 2022 Loading…
Latest DeepSpeed Support
#663 by Quentin-Anthony was merged Mar 9, 2023 Draft
3 of 5 tasks
Release V2
2
6
[WIP] Adding AxoNN's two-dimensional tensor parallelism to GPT-NeoX feature request New feature or request
#978 by siddharth9820 was closed Nov 28, 2023 Loading…
5 tasks
Hacky FlashAttn 2 support
#1000 by haileyschoelkopf was closed Sep 22, 2023 Draft
Fixed AnnealingLR Class and Cosine Decay Schedule
#1008 by kshitijkg was merged Aug 7, 2023 Loading…
Lion Optimizer
#1062 by andylolu2 was merged Oct 20, 2023 Loading…
Add Mamba Architecture
#1157 by haileyschoelkopf was merged Mar 10, 2024 Loading…
Refactor how we configure sparsity
#284 by sdtblck was merged May 2, 2021 Loading…
ProTip! Follow long discussions with comments:>50.