Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from EleutherAI:main #2

Merged
merged 105 commits into from
Jul 29, 2024
Merged
Changes from 4 commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
e277bc7
fix lion optimizer documentation (#1067)
jahatef Oct 31, 2023
f574f22
Fix preprocess_data.py link (#1064)
Quentin-Anthony Oct 31, 2023
fcc5af5
Edge-casing for multi-GPU HF-to-NeoX conversion (#1065)
haileyschoelkopf Nov 1, 2023
8c9fc00
Create tools __init__.py for import (#1068)
Quentin-Anthony Nov 1, 2023
a10f69c
Pin version of `lm_eval` (#1070)
haileyschoelkopf Nov 1, 2023
41f019e
fixed case when ntasks_per_node is used instead (#1069)
AIproj Nov 1, 2023
90aa131
Update README.md
StellaAthena Nov 5, 2023
04dc2ba
When processing mlp.dense_4h_to_h.bias and attention.dense.bias, tp_r…
kyuheejang Nov 7, 2023
f214358
Merge pull request #1072 from kyuheejang/Fixing-neox-to-huggingface
StellaAthena Nov 7, 2023
d8028f8
Resolve error in the `test_neoxargs_usage` unit test (#1074)
mkerin Nov 8, 2023
10bf788
Update neox_args.py (#1081)
jahatef Nov 16, 2023
f48d3a6
Update README.md (#1082)
StellaAthena Nov 22, 2023
efea81f
Update README.md
StellaAthena Nov 30, 2023
3be59a4
Extend ci suite (#1080)
mkerin Dec 4, 2023
a2b2020
Patch coverity scan (#1090)
jaimemcc-intel Dec 4, 2023
050f560
Corrects FLOPs formula as per 1093 (#1094)
StellaAthena Dec 6, 2023
f19b2ec
Update CODEOWNERS
StellaAthena Dec 19, 2023
07166da
Bump transformers from 4.30.2 to 4.36.0 in /requirements (#1097)
dependabot[bot] Dec 20, 2023
9283eff
Pins old DeeperSpeed until bug is fixed (#1095)
StellaAthena Dec 20, 2023
9eef954
Update README.md
StellaAthena Dec 22, 2023
a48e09e
Update README.md
StellaAthena Dec 22, 2023
613e5a6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
be7eeda
Update README.md
StellaAthena Dec 22, 2023
2117afc
Update README.md
StellaAthena Dec 22, 2023
8dba5b6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
f161245
Add QK Normalization (#1100)
lintangsutawika Dec 22, 2023
7fb3b3c
Update README.md
StellaAthena Dec 22, 2023
a7509f0
Update README.md
StellaAthena Dec 22, 2023
8eaac4e
Merge branch 'main' into StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
4d5a811
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
05cc29c
Merge pull request #1099 from EleutherAI/StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
e25446e
Merge branch 'main' into StellaAthena-patch-4
StellaAthena Dec 22, 2023
287f9f7
Merge pull request #1102 from EleutherAI/StellaAthena-patch-4
StellaAthena Dec 22, 2023
b27e409
Lm eval 0.4.0 support (#1101)
haileyschoelkopf Dec 23, 2023
1148a0f
Update README.md
StellaAthena Dec 23, 2023
e5a7ea7
Update neox_args.py (#1107)
StellaAthena Dec 26, 2023
eca6b1a
Fix repo for CI (#1106)
yang Jan 4, 2024
98716eb
Fix install, Dockerfile, CI (#1104)
yang Jan 4, 2024
77605ca
Fused Rotary Embeddings (fixed) (#1108)
yang Jan 5, 2024
f14782a
Add pythia 14M and 31M configs (#1111)
segyges Jan 5, 2024
e6e944a
Add docker compose and change containerized setup instructions to use…
segyges Jan 9, 2024
92b1b6f
Fix openwebtext2 downloader, backport improvements to DataDownloader …
segyges Jan 11, 2024
90f70ff
Bump jinja2 from 3.1.2 to 3.1.3 in /requirements (#1120)
dependabot[bot] Jan 13, 2024
6399155
Enable passing of `--account` to `srun` / SlurmLauncher (#1126)
haileyschoelkopf Jan 19, 2024
7a8fa2f
update copyrights (#1128)
jahatef Jan 24, 2024
3d8fec0
fused layernorm (#1105)
yang Jan 26, 2024
e5602c3
Contributing Guide (#1138)
jahatef Jan 29, 2024
1c133bf
moved eval import and added to docs (#1139)
R0n12 Jan 30, 2024
032ec8c
Update lm_eval v0.4 to PyPI dependencies (#1141)
haileyschoelkopf Feb 1, 2024
91c44bc
Remove gas (beano) (#1144)
segyges Feb 5, 2024
f7373f8
Improve Conversion Utilities (#1124)
haileyschoelkopf Feb 8, 2024
412cf6e
Fixes distributed tests, and skips tests that are broken. (#1149)
jahatef Feb 21, 2024
46d179c
Memory profiling (#1153)
jahatef Feb 21, 2024
eee03b2
add profiling to readme (#1154)
jahatef Feb 23, 2024
a7638a8
Python version update (#1122)
segyges Feb 23, 2024
72d1803
Minor changes (#1125)
segyges Feb 23, 2024
f36aed7
Draft PR Adding mistral 0.1 (#1131)
AIproj Feb 23, 2024
9663802
[Bug?] Fix profiling argument names (#1155)
haileyschoelkopf Feb 26, 2024
3c03fc7
Update cpu_ci.yml (#1159)
jaimemcc-intel Feb 29, 2024
19596b0
Improve argument validation for Flash-attn + SWA (#1162)
haileyschoelkopf Mar 2, 2024
119950c
Single node Pythia 14M training on ngc pytorch 24.02 container (#1170)
tf-nv Mar 4, 2024
7b8187a
Remove unnecessary fp32/bf16 conversion (#1169)
DayOfThePenguin Mar 4, 2024
31cfe52
Ignore markdown for pre-commit (#1171)
Quentin-Anthony Mar 4, 2024
e109bf5
Make rotary freqs buffer non-persistent (#1168)
haileyschoelkopf Mar 4, 2024
df8cf24
Support Lion with Zero Optimizer (#1166)
DayOfThePenguin Mar 4, 2024
86758c3
Add MoE (#1129)
yang Mar 7, 2024
63b9fa1
remove `best_download` as dependency (#1179)
haileyschoelkopf Mar 8, 2024
90d4cb3
Fix documentation for --jsonl-keys argument of preprocess_data script…
KeitaW Mar 8, 2024
8c13642
clean up dockerfile: (#1175)
tf-nv Mar 8, 2024
c1fa994
When using kv cache and flash attention in conjunction, it's crucial …
chaochen99 Mar 8, 2024
1e7abe7
Remove gas from Pythia configs (#1181)
yang Mar 8, 2024
82ddc66
Fix moe_loss in gpt_j_residual path (#1180)
yang Mar 8, 2024
6809bbc
Add Mamba Architecture (#1157)
haileyschoelkopf Mar 10, 2024
03186de
Switch to using Cuda Flash Attn for Alibi (#1183)
haileyschoelkopf Mar 13, 2024
277141e
Mamba + Tensor Parallel Support (#1184)
haileyschoelkopf Mar 15, 2024
7267a74
[ZeRO-3] Partitioned init with `deepspeed.zero.Init()` (#1190)
R0n12 Mar 19, 2024
e6b5261
Small typo in the README
Mar 26, 2024
4085302
Merge pull request #1196 from edouardoyallon/typo_readme
StellaAthena Mar 26, 2024
1960b66
Added more papers
StellaAthena Mar 26, 2024
3616658
Update README.md
StellaAthena Mar 26, 2024
977448e
making PR triggered CPU test for changes to megatron (#1195)
jaimemcc-intel Apr 1, 2024
51a7de9
[AMD] Supporting fused kernels build using JIT (#1188)
R0n12 Apr 1, 2024
01657aa
[ZeRO-3] Ensured passing neox deepspeed_config when using partitioned…
R0n12 Apr 1, 2024
703d02f
Fix flash config for llama2/70B.yml config (#1206)
Quentin-Anthony Apr 24, 2024
838d5bf
Fixes a weird typo (#1207)
StellaAthena Apr 25, 2024
9d9d7c8
Bump transformers from 4.36.0 to 4.38.0 in /requirements (#1199)
dependabot[bot] May 4, 2024
06e5f0c
Jaimemcc intel/ci composite cpu tests (#1205)
jaimemcc-intel May 4, 2024
916c883
Add megablocks dropless MoE (#1192)
yang May 4, 2024
c814959
Fix bug in tools/ckpts/convert_neox_to_hf.py for setting intermediate…
jvendrow May 4, 2024
4bc6670
add rwkv support (#1198)
jahatef May 6, 2024
49cd41f
Bump jinja2 from 3.1.3 to 3.1.4 in /requirements (#1211)
dependabot[bot] May 13, 2024
d037756
Run document update again (#1216)
jahatef May 16, 2024
153e732
Rwkv pipeline parallelism (#1221)
jahatef May 21, 2024
2746d43
Add Torch Profiler Support (#1226)
DayOfThePenguin May 21, 2024
1d55708
fixed fused_rope naming in JIT + added readme for amd support (#1224)
R0n12 May 21, 2024
d3d59f2
Small tidying (#1222)
yang May 21, 2024
dfc6722
Fix markdown formatting error (#1217)
StellaAthena May 26, 2024
b5c0afe
add workflow_dispatch to gh actions pr so we can run on command (#1233)
jahatef Jun 4, 2024
4a34e0a
init changes to README (#1232)
jaimemcc-intel Jun 5, 2024
90a6cdb
fix summed biases not being divided by mp size (#1220)
dmahan93 Jun 7, 2024
2382bd4
Fix changed behavior of pipe_parallel (#1219)
yang Jun 7, 2024
4c426da
Conversion script bugfixes (#1218)
haileyschoelkopf Jun 7, 2024
2608972
fix python version and pytest install (#1234)
jahatef Jun 19, 2024
0e5f6db
Add a chat data preprocessing script (#1239)
dmahan93 Jun 25, 2024
1cee5b7
Fix paper reference in init_functions.py (#1241)
rasbt Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ The following publications by other research groups use this library:
- Eghbal A. Hosseini, Martin A. Schrimpf, Yian Zhang, Samuel Bowman, Noga Zaslavsky, and Evelina Fedorenko. "[Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training.](https://www.biorxiv.org/content/10.1101/2022.10.04.510681)" _BioRxiv_, 2022.
- Byung-Doh Oh and William Schuler. "[Transformer-Based LM Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens](https://arxiv.org/abs/2304.11389)." In *Findings of the Association for Computational Linguistics*, 2023.
- Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, and Peter Ramadge. "[Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis](https://aclanthology.org/2023.acl-long.756/)." In _Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)_, pp. 13522-13537, 2023.
- Ta-Chung Chi, Ting-Han Fan, Li-Wei Chen, Alexander Rudnicky, and Peter Ramadge. "[Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings](https://aclanthology.org/2023.acl-short.102/)." In _Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)_, pp. 13522-13537 (2023).
- Xidong Feng, Yicheng Luo, Ziyan Wang, Hongrui Tang, Mengyue Yang, Kun Shao, David Mguni, Yali Du, and Jun Wang. "[ChessGPT: Bridging Policy Learning and Language Modeling.](https://arxiv.org/abs/2306.09200)" _arXiv preprint arXiv:2306.09200_, 2023.
- Orion Walker Dollar, Sameera Horawalavithana, Scott Vasquez, W. James Pfaendtner, and Svitlana Volkova. "[MolJET: Multimodal Joint Embedding Transformer for Conditional de novo Molecular Design and Multi-Property Optimization.](https://openreview.net/pdf?id=7UudBVsIrr)" _preprint under review_, 2023.
- Jean Kaddour and Qi Liu. "[Text Data Augmentation in Low-Resource Settings via Fine-Tuning of Large Language Models](https://arxiv.org/abs/2310.01119)." _arXiv:2310.01119_, 2023.
Expand Down