Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from EleutherAI:main #2

Merged
merged 105 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
e277bc7
fix lion optimizer documentation (#1067)
jahatef Oct 31, 2023
f574f22
Fix preprocess_data.py link (#1064)
Quentin-Anthony Oct 31, 2023
fcc5af5
Edge-casing for multi-GPU HF-to-NeoX conversion (#1065)
haileyschoelkopf Nov 1, 2023
8c9fc00
Create tools __init__.py for import (#1068)
Quentin-Anthony Nov 1, 2023
a10f69c
Pin version of `lm_eval` (#1070)
haileyschoelkopf Nov 1, 2023
41f019e
fixed case when ntasks_per_node is used instead (#1069)
AIproj Nov 1, 2023
90aa131
Update README.md
StellaAthena Nov 5, 2023
04dc2ba
When processing mlp.dense_4h_to_h.bias and attention.dense.bias, tp_r…
kyuheejang Nov 7, 2023
f214358
Merge pull request #1072 from kyuheejang/Fixing-neox-to-huggingface
StellaAthena Nov 7, 2023
d8028f8
Resolve error in the `test_neoxargs_usage` unit test (#1074)
mkerin Nov 8, 2023
10bf788
Update neox_args.py (#1081)
jahatef Nov 16, 2023
f48d3a6
Update README.md (#1082)
StellaAthena Nov 22, 2023
efea81f
Update README.md
StellaAthena Nov 30, 2023
3be59a4
Extend ci suite (#1080)
mkerin Dec 4, 2023
a2b2020
Patch coverity scan (#1090)
jaimemcc-intel Dec 4, 2023
050f560
Corrects FLOPs formula as per 1093 (#1094)
StellaAthena Dec 6, 2023
f19b2ec
Update CODEOWNERS
StellaAthena Dec 19, 2023
07166da
Bump transformers from 4.30.2 to 4.36.0 in /requirements (#1097)
dependabot[bot] Dec 20, 2023
9283eff
Pins old DeeperSpeed until bug is fixed (#1095)
StellaAthena Dec 20, 2023
9eef954
Update README.md
StellaAthena Dec 22, 2023
a48e09e
Update README.md
StellaAthena Dec 22, 2023
613e5a6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
be7eeda
Update README.md
StellaAthena Dec 22, 2023
2117afc
Update README.md
StellaAthena Dec 22, 2023
8dba5b6
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
f161245
Add QK Normalization (#1100)
lintangsutawika Dec 22, 2023
7fb3b3c
Update README.md
StellaAthena Dec 22, 2023
a7509f0
Update README.md
StellaAthena Dec 22, 2023
8eaac4e
Merge branch 'main' into StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
4d5a811
Update NeoXArgs docs automatically
invalid-email-address Dec 22, 2023
05cc29c
Merge pull request #1099 from EleutherAI/StellaAthena-patch-4-1
StellaAthena Dec 22, 2023
e25446e
Merge branch 'main' into StellaAthena-patch-4
StellaAthena Dec 22, 2023
287f9f7
Merge pull request #1102 from EleutherAI/StellaAthena-patch-4
StellaAthena Dec 22, 2023
b27e409
Lm eval 0.4.0 support (#1101)
haileyschoelkopf Dec 23, 2023
1148a0f
Update README.md
StellaAthena Dec 23, 2023
e5a7ea7
Update neox_args.py (#1107)
StellaAthena Dec 26, 2023
eca6b1a
Fix repo for CI (#1106)
yang Jan 4, 2024
98716eb
Fix install, Dockerfile, CI (#1104)
yang Jan 4, 2024
77605ca
Fused Rotary Embeddings (fixed) (#1108)
yang Jan 5, 2024
f14782a
Add pythia 14M and 31M configs (#1111)
segyges Jan 5, 2024
e6e944a
Add docker compose and change containerized setup instructions to use…
segyges Jan 9, 2024
92b1b6f
Fix openwebtext2 downloader, backport improvements to DataDownloader …
segyges Jan 11, 2024
90f70ff
Bump jinja2 from 3.1.2 to 3.1.3 in /requirements (#1120)
dependabot[bot] Jan 13, 2024
6399155
Enable passing of `--account` to `srun` / SlurmLauncher (#1126)
haileyschoelkopf Jan 19, 2024
7a8fa2f
update copyrights (#1128)
jahatef Jan 24, 2024
3d8fec0
fused layernorm (#1105)
yang Jan 26, 2024
e5602c3
Contributing Guide (#1138)
jahatef Jan 29, 2024
1c133bf
moved eval import and added to docs (#1139)
R0n12 Jan 30, 2024
032ec8c
Update lm_eval v0.4 to PyPI dependencies (#1141)
haileyschoelkopf Feb 1, 2024
91c44bc
Remove gas (beano) (#1144)
segyges Feb 5, 2024
f7373f8
Improve Conversion Utilities (#1124)
haileyschoelkopf Feb 8, 2024
412cf6e
Fixes distributed tests, and skips tests that are broken. (#1149)
jahatef Feb 21, 2024
46d179c
Memory profiling (#1153)
jahatef Feb 21, 2024
eee03b2
add profiling to readme (#1154)
jahatef Feb 23, 2024
a7638a8
Python version update (#1122)
segyges Feb 23, 2024
72d1803
Minor changes (#1125)
segyges Feb 23, 2024
f36aed7
Draft PR Adding mistral 0.1 (#1131)
AIproj Feb 23, 2024
9663802
[Bug?] Fix profiling argument names (#1155)
haileyschoelkopf Feb 26, 2024
3c03fc7
Update cpu_ci.yml (#1159)
jaimemcc-intel Feb 29, 2024
19596b0
Improve argument validation for Flash-attn + SWA (#1162)
haileyschoelkopf Mar 2, 2024
119950c
Single node Pythia 14M training on ngc pytorch 24.02 container (#1170)
tf-nv Mar 4, 2024
7b8187a
Remove unnecessary fp32/bf16 conversion (#1169)
DayOfThePenguin Mar 4, 2024
31cfe52
Ignore markdown for pre-commit (#1171)
Quentin-Anthony Mar 4, 2024
e109bf5
Make rotary freqs buffer non-persistent (#1168)
haileyschoelkopf Mar 4, 2024
df8cf24
Support Lion with Zero Optimizer (#1166)
DayOfThePenguin Mar 4, 2024
86758c3
Add MoE (#1129)
yang Mar 7, 2024
63b9fa1
remove `best_download` as dependency (#1179)
haileyschoelkopf Mar 8, 2024
90d4cb3
Fix documentation for --jsonl-keys argument of preprocess_data script…
KeitaW Mar 8, 2024
8c13642
clean up dockerfile: (#1175)
tf-nv Mar 8, 2024
c1fa994
When using kv cache and flash attention in conjunction, it's crucial …
chaochen99 Mar 8, 2024
1e7abe7
Remove gas from Pythia configs (#1181)
yang Mar 8, 2024
82ddc66
Fix moe_loss in gpt_j_residual path (#1180)
yang Mar 8, 2024
6809bbc
Add Mamba Architecture (#1157)
haileyschoelkopf Mar 10, 2024
03186de
Switch to using Cuda Flash Attn for Alibi (#1183)
haileyschoelkopf Mar 13, 2024
277141e
Mamba + Tensor Parallel Support (#1184)
haileyschoelkopf Mar 15, 2024
7267a74
[ZeRO-3] Partitioned init with `deepspeed.zero.Init()` (#1190)
R0n12 Mar 19, 2024
e6b5261
Small typo in the README
Mar 26, 2024
4085302
Merge pull request #1196 from edouardoyallon/typo_readme
StellaAthena Mar 26, 2024
1960b66
Added more papers
StellaAthena Mar 26, 2024
3616658
Update README.md
StellaAthena Mar 26, 2024
977448e
making PR triggered CPU test for changes to megatron (#1195)
jaimemcc-intel Apr 1, 2024
51a7de9
[AMD] Supporting fused kernels build using JIT (#1188)
R0n12 Apr 1, 2024
01657aa
[ZeRO-3] Ensured passing neox deepspeed_config when using partitioned…
R0n12 Apr 1, 2024
703d02f
Fix flash config for llama2/70B.yml config (#1206)
Quentin-Anthony Apr 24, 2024
838d5bf
Fixes a weird typo (#1207)
StellaAthena Apr 25, 2024
9d9d7c8
Bump transformers from 4.36.0 to 4.38.0 in /requirements (#1199)
dependabot[bot] May 4, 2024
06e5f0c
Jaimemcc intel/ci composite cpu tests (#1205)
jaimemcc-intel May 4, 2024
916c883
Add megablocks dropless MoE (#1192)
yang May 4, 2024
c814959
Fix bug in tools/ckpts/convert_neox_to_hf.py for setting intermediate…
jvendrow May 4, 2024
4bc6670
add rwkv support (#1198)
jahatef May 6, 2024
49cd41f
Bump jinja2 from 3.1.3 to 3.1.4 in /requirements (#1211)
dependabot[bot] May 13, 2024
d037756
Run document update again (#1216)
jahatef May 16, 2024
153e732
Rwkv pipeline parallelism (#1221)
jahatef May 21, 2024
2746d43
Add Torch Profiler Support (#1226)
DayOfThePenguin May 21, 2024
1d55708
fixed fused_rope naming in JIT + added readme for amd support (#1224)
R0n12 May 21, 2024
d3d59f2
Small tidying (#1222)
yang May 21, 2024
dfc6722
Fix markdown formatting error (#1217)
StellaAthena May 26, 2024
b5c0afe
add workflow_dispatch to gh actions pr so we can run on command (#1233)
jahatef Jun 4, 2024
4a34e0a
init changes to README (#1232)
jaimemcc-intel Jun 5, 2024
90a6cdb
fix summed biases not being divided by mp size (#1220)
dmahan93 Jun 7, 2024
2382bd4
Fix changed behavior of pipe_parallel (#1219)
yang Jun 7, 2024
4c426da
Conversion script bugfixes (#1218)
haileyschoelkopf Jun 7, 2024
2608972
fix python version and pytest install (#1234)
jahatef Jun 19, 2024
0e5f6db
Add a chat data preprocessing script (#1239)
dmahan93 Jun 25, 2024
1cee5b7
Fix paper reference in init_functions.py (#1241)
rasbt Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Improve Conversion Utilities (EleutherAI#1124)
* draft: unify sequential + PPModule conversion scripts

* Update NeoXArgs docs automatically

* draft: pull out model param names / model definition

* Update NeoXArgs docs automatically

* tested: neox models with TP = 1, PipelineModule, work

* Update NeoXArgs docs automatically

* draft: Llama + GQA QKV resharding

* Update NeoXArgs docs automatically

* update Llama conversion script to support Mistral and GQA

* Update NeoXArgs docs automatically

* test Mistral-7B conversion

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* push documentation on imports / Llama loading

* push further readme updates (Mistral included)

* Preventconversions for unsupported featurees, disclaim in README

* Update NeoXArgs docs automatically

* revert PR#1072 RowParallel bias conversion error

* remove sequential_to_hf and module_to_hf scripts, deprecated in favor of convert_neox_to_hf.py

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
  • Loading branch information
3 people committed Feb 8, 2024
commit f7373f806689cb270677dd48bffddf4a32bfadce
40 changes: 31 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -501,18 +501,20 @@ where `--eval_tasks` is a list of evaluation tasks followed by spaces, e.g `--ev

# Exporting to Hugging Face

GPT-NeoX is optimized heavily for training only, and GPT-NeoX model checkpoints are not compatible out of the box with other deep learning libraries. To make models easily loadable and shareable with end users, and for further exporting to various other frameworks, GPT-NeoX supports checkpoint conversion to the [Hugging Face Transformers](https://arxiv.org/abs/1910.03771) GPTNeoXModel format.
GPT-NeoX is optimized heavily for training only, and GPT-NeoX model checkpoints are not compatible out of the box with other deep learning libraries. To make models easily loadable and shareable with end users, and for further exporting to various other frameworks, GPT-NeoX supports checkpoint conversion to the [Hugging Face Transformers](https://arxiv.org/abs/1910.03771) format.

To convert a NeoX checkpoint (with pipeline-parallel-size>=1) to Hugging Face-loadable format, run:
```bash
python ./tools/ckpts/convert_module_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location
```
Though NeoX supports a number of different architectural configurations, including AliBi positional embeddings, not all of these configurations map cleanly onto the supported configurations within Hugging Face Transformers.

NeoX supports export of compatible models into the following architectures:
- GPTNeoXForCausalLM
- LlamaForCausalLM (GQA Support Coming Soon -- all Llama 1 models and Llama 2 / Codellama up to size 13B supported)

Training a model which does not fit into one of these Hugging Face Transformers architectures cleanly will require writing custom modeling code for the exported model.

To convert a sequential model to Hugging Face format, run:
To convert a GPT-NeoX library checkpoint to Hugging Face-loadable format, run:
```bash
python ./tools/ckpts/convert_sequential_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location
python ./tools/ckpts/convert_neox_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location --precision {auto,fp16,bf16,fp32} --architecture {neox,llama}
```
(Note: this script should be used for v2.0 checkpoints saved on a v2.0 commit prior to https://github.com/EleutherAI/gpt-neox/pull/866 and which used `pipe-parallel-size=1`. Using `pipe-parallel-size=0` will also save models in this format.)

Then to upload a model to [the Hugging Face Hub](https://huggingface.co/), run:
```bash
Expand All @@ -521,7 +523,27 @@ python ./tools/ckpts/upload.py
```
and input the requested information, including HF hub user token.

Note, however, that this compatibility is not one-to-one, and only certain configurations from GPT-NeoX are supported in the Hugging Face GPTNeoXModel class. Advanced features such as alternative positional embeddings may require new Transformers modeling code and new conversion script tweaks.
### Importing Models Into GPT-NeoX

NeoX supplies several utilities for converting a pretrained model checkpoint into a format that can be trained within the library.

The following models can be loaded in GPT-NeoX:
- Llama 1
- Llama 2 (Up to size 13B)
- CodeLlama (Up to size 13B)
- Mistral-7b-v0.1 (Coming Soon!)

We provide two utilities for converting from two different checkpoint formats into a format compatible with GPT-NeoX.

To convert a Llama 1 or Llama 2 checkpoint distributed by Meta AI from its original file format (downloadable [here](https://github.com/facebookresearch/llama) or [here](https://huggingface.co/meta-llama/Llama-2-7b)) into the GPT-NeoX library, run

```
python tools/ckpts/convert_raw_llama_weights_to_neox.py --input_dir /path/to/model/parent/dir/7B --model_size 7B --output_dir /path/to/save/ckpt --num_output_shards <TENSOR_PARALLEL_SIZE> (--pipeline_parallel if pipeline-parallel-size >= 1)
```


To convert from a Hugging Face model into a NeoX-loadable, run `tools/ckpts/convert_hf_to_sequential.py`. See documentation within that file for further options.


# Monitoring

Expand Down
9 changes: 5 additions & 4 deletions configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = 78b8466
Default = 6a8a829

current git hash of repository

Expand Down Expand Up @@ -976,7 +976,7 @@ Text Generation arguments

- **prompt_end**: str

Default =
Default =


a single prompt's end. Defaults to newline
Expand Down Expand Up @@ -1018,7 +1018,7 @@ Text Generation arguments

- **eval_results_prefix**: str

Default =
Default =

prefix to which to save evaluation results - final fp will be {eval_results_prefix}_eval_results_yy-mm-dd-HH-MM.json

Expand Down Expand Up @@ -1762,7 +1762,7 @@ Args for deepspeed config

Default = None





Expand Down Expand Up @@ -2062,3 +2062,4 @@ Args for deepspeed runner (deepspeed.launcher.runner).
Default = None

Adds a `--account` to the DeepSpeed launch command. In DeeperSpeed this is passed on to the SlurmLauncher as well. Sometimes necessary for cluster rules, or so I've heard.

Loading