[pull] main from EleutherAI:main #2

* fix lion optimizer documentation * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Fix preprocess_data.py link * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* edge-casing for multiGPU hf to sequential case * cleanup whitespace * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Pin lm_eval version * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

…anks are not reflected, so strange results always appear when tp_ranks is greater than 1.

�Fixing convert neox to huggingface bug

* Update neox_args.py These attention configuration options were missing from the docs. This will fix that. * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Update README.md * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Use `.yml` extensions in README to reflect extensions used in `configs/` folder * Rename `save_interval` -> `checkpoint_factor` * Mark expected failures in existing tests * Fix minor typos * Allow creation of checkpoint at iteration 0 when `do_train=False` Helpful for unit tests because it allows use of a randomly initialised model * Delete duplicated `test_fused_kernels.py` Primary version lives in `tests/model/test_fused_kernels.py` * Avoid initializing CUDA whenever `megatron` is imported Resolves `Cannot re-initialize CUDA in forked subprocess` error when running distributed unit tests * Extend suite of unit tests

* Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml update build command to avert empty cwd in build metrics * Update coverity_scan.yml * Update coverity_scan.yml adding verbose to debug curl * Update coverity_scan.yml debug print trace to examine build metrics xml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update coverity_scan.yml * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update logging.py * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Remove myself as a code owner as I shouldn't be approving PRs.

Bumps [transformers](https://github.com/huggingface/transformers) from 4.30.2 to 4.36.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.30.2...v4.36.0) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Pins old DeeperSpeed until bug is fixed There is a bug in upstream DeepSpeed detailed [here](microsoft/DeepSpeed#4781) that we didn't catch before synching with main. This pins the prior commit so the bug doesn't impact users. * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* add qk normalization * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

Update README.md

More readme updates

* add lm-eval v0.4.0 * rename evaluate.py to avoid shadowing HF evaluate library * document new evaluate.py filename * Update NeoXArgs docs automatically * handle results format differently * Update NeoXArgs docs automatically * Update hanging evaluate.py scripts * Update NeoXArgs docs automatically * Add triviaqa to default eval_tasks * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update neox_args.py Changed some default values to correspond to values that we generally recommend people use. * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Fix syntax errors * Make pre-commit fixes across repo * Ensure correct version of clang-format in CI --------- Co-authored-by: Yang Zhang <[email protected]>

* Add missing jinja2 dep Missing transitive dep of lm_eval * Fix Dockerfile Only devel has nvcc, needed to build packages And don't rebuild fused kernels if no relevant change * Ensure Dockerfile builds in CI Also ensures that install actually works --------- Co-authored-by: Yang Zhang <[email protected]>

* Create fused_rotary_positional_embedding.cpp * Create fused_rotary_positional_embedding.h * Create fused_rotary_positional_embedding_cuda.cu * Update fused_rotary_positional_embedding.h Ports the fix from NVIDIA/apex#1750 into this branch. * Update neox_args.py * Update setup.py * Update initialize.py * Update setup.py * Update __init__.py * Update test_fused_kernels.py * Update setup.py * Create fused_rope.py * Update fused_rotary_positional_embedding.h * Update fused_rotary_positional_embedding.cpp * Update fused_rotary_positional_embedding.cpp * Update transformer.py * Update transformer.py Just checked and this should work for bf16. Or, at least, the reason I originally thought it wouldn't doesn't apply. * Update transformer.py * Create 125M_fused_rope.yml * Update 125M_fused_rope.yml * Update transformer.py Add `self.rope_fusion = neox_args.rope_fusion` so that `ParallelSelfAttention` knows if we're using rope fusion. * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Fix fused rope Just needed to bring in the latest headers/sources, and call into it the right way from transformers.py. * Add rope_fusion arg to all ymls --------- Co-authored-by: Stella Biderman <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Yang Zhang <[email protected]>

* Add pythia 14M config * Create 31M.yml

… it (#1113) * Add pythia 14M config * Create 31M.yml * Add docker compose, update readme docker instructions to utilize it * Add logging limits to docker-compose files * Change data mount from /gpt-neox/data to /data/ This prevents possible errors if the user already has a /data/ directory in their /gpt-neox/ folder * Update README.md Makes the code blocks into blocks in the changed parts * Make the docker-compose spinup tidier * Avoid config bloat by only providing the updated paths * Apply precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

…1116)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add `account` to Deepspeed args * Add handling of `account` when `deepspeed_slurm` is set * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* update copyrights * Update NeoXArgs docs automatically * nvidia copyright years * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Add simple util for CUDA timings * Add fused layernorm kernel from Megatron Closes #952 * change default fused layernorm to false * Update test_setup.yml * Update test_train_base.yml --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: jahatef <[email protected]> Co-authored-by: Jacob Hatef <[email protected]>

* contributing guide * Update NeoXArgs docs automatically * Update CONTRIBUTING.md * Update NeoXArgs docs automatically * Remove microsoft references and link on main readme * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update requirements.txt * Update requirements.txt * Update NeoXArgs docs automatically * add note to neox_args.py * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Remove 'gas' configuration variable * Remove gas from configs and config documentation * Update training.py

* draft: unify sequential + PPModule conversion scripts * Update NeoXArgs docs automatically * draft: pull out model param names / model definition * Update NeoXArgs docs automatically * tested: neox models with TP = 1, PipelineModule, work * Update NeoXArgs docs automatically * draft: Llama + GQA QKV resharding * Update NeoXArgs docs automatically * update Llama conversion script to support Mistral and GQA * Update NeoXArgs docs automatically * test Mistral-7B conversion * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * push documentation on imports / Llama loading * push further readme updates (Mistral included) * Preventconversions for unsupported featurees, disclaim in README * Update NeoXArgs docs automatically * revert PR#1072 RowParallel bias conversion error * remove sequential_to_hf and module_to_hf scripts, deprecated in favor of convert_neox_to_hf.py * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fixes distributed tests, and skips tests that are broken. * Update NeoXArgs docs automatically * improve pytest msgs and remove commented code * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fixes distributed tests, and skips tests that are broken. * memory profiling for gpt-neox. Only works for pp=0, pp=1+ needs DS commits. * Update NeoXArgs docs automatically * adds memory profiling for pipeline parallel * Update NeoXArgs docs automatically * fix spacing * Update NeoXArgs docs automatically * fix spacing again * Update NeoXArgs docs automatically * get rid of unwanted changes * Update NeoXArgs docs automatically * get rid of file * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add nsight systems support * remove tests changes again * Update NeoXArgs docs automatically * add tests * Update NeoXArgs docs automatically * Update training.py * Update NeoXArgs docs automatically * Add assertion message * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* add profiling to readme * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Switch default command for docker image * Rename pythia paths docker file for clarity * Update docker build to use python 3.10 * Update github workflows to use ubuntu 22.04 and python 3.10 * Bump pytorch library patch versions * Add pytest-html for reasonably formatted test reports * Fix build after torch and cuda version bump * Fix apex install for newer version 1) This, empirically, works, as tested by running the build and kicking off training. 2) Apex documentation says it is incorrect syntax and deprecated. 3) It takes so long to compile that it is probably, all by itself, something that needs fixing. 4) I will probably pull the fused adamw out of apex. 5) It has been building for twenty minutes so I am going to go do something else. * Fix pip version to ensure apex compilation remains good * Fix unit test for evaluate * Fix pip requirement Prevents possible build issues with apex especially across divergent pip versions * Update dockerfile to point to stripped-down apex repo * Revert "Update dockerfile to point to stripped-down apex repo" This reverts commit 40c7656. * Update apex version in dockerfile * Switch to downloading prebuilt apex wheel * Clean up docker copy commands * Have docker build conditionally get binaries or build apex * Apply precommit

* Switch default command for docker image * Rename pythia paths docker file for clarity * Fix unit test for evaluate * Update readme for testing to omit --forked argument * Add pytest-html to requirements-dev.txt * Revert "Update readme for testing to omit --forked argument" This reverts commit 19021fc. * Add data/ directory and .bin and .idx files in /tests/data to .gitignore This makes it so that git doesn't try to let you commit (or force you to stash) data files * Make .gitignore for data files slightly more elegant * Add utility script for doing token counts on processed datasets * Run precommit hook * Fix token count script, run precommit

* add support for flash attention 2 * change cosine decay to chinchilla style * set default warmup to none so that warmup_iters can be set * fixed bug * fixed chinchilla lr * add s3 checkpoint syncing * rotary embedding in fp32 * fix for seq_len < max_seq_len * some fixes, still not working * ?' : * fix bugs; evaluate on step 0 * first attempt at gqa * gqa works in kv_heads==query_heads case * gqa working * workaround for FSX quota * update with llemma * update with recent PR * README and requirements updated * Added Mistral config * Added sliding window through flash attention 2 * Added sliding window * Mistral should likely use mp=2 like llama2 * Update gitignore * Removed unused CPCargo import * Conversion script (WIP) * Fixed missing slurm environ vars * updated mistral config * updated job script * initial commit conversion mistral hf to sequential * Added stacking q, k, v appropriately for mp ranks * pp=0 support from end of 2023 * Cleaning up config and removing Autoconfig in conversion script * Cleaned up conversion example script * cleanup: add back configs folder, discard Llemma readme * cleanup: remove llemma lr sched changes, re-add requirements/ folder * docs: add explanation of intermediate_size behavior * args: add argument checking for num_kv_heads, clean up usage syntax * args: prevent num KV heads < TP worldsize * readd triton flash attn func * cleanup: use tools/ dir from main * docs: re-add mistral , GQA as supported * cleanup: delete duplicate tools/ files * cleanup: use fp32 rope (non-fused) from main * cleanup: no longer block out GQA codepaths in conversion scripts * cleanup: gqa code a bit * add llama2, llemma configs * add non-flash GQA ; refactor modeling code * clean up mistral config for commit * further cleanup configs dir * remove slurm script from llemma * update seqlen params for codellama, llemma configs * add more comments to GQA code, and make reshapes more readable * make inv_freq non-persistent * actually, just ensure mistral has inv_freqs as a persistent buffer * non-flash GQA works, so ensure arguments.py permits it * no longer use our own copies of flash attention interface functions * remove unused mpu util fn * delete unused config file * fix diff on mpu/utils.py * remove slurm scripts that won't be in this PR * run pre-commit * update tests for conversion scripts * add flash version check for sliding window * pre-commit --------- Co-authored-by: zhangir-azerbayev <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* possibly fix profiling flag names * actually, profile_backward already exists * Update NeoXArgs docs automatically * neox_args.profile was also used some places, update that too * Update NeoXArgs docs automatically * profiling --> profile * Update NeoXArgs docs automatically * Revert neox_arguments.md changes * Update NeoXArgs docs automatically * Update gen_docs since __name__ only returns the Literal for string args with Python 3.10 * Update NeoXArgs docs automatically * Another update to preserve non-literals * Update NeoXArgs docs automatically * add union * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update cpu_ci.yml Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Improve argument validation for Flash-attn + SWA * Update NeoXArgs docs automatically * don't pass window_size if not necessary * Update NeoXArgs docs automatically * Update 7B.yml * Update NeoXArgs docs automatically * apply precommit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* Pythia 14M training on ngc pytorch 24.02 container * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* feat: remove unnecessary bf16 conversions since no collective op is performed * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* ignore markdown for pre-commit * only ignore end of file and trailing whitespace * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* make inv_freq non-persistent by default * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* feat: deepspeed zero lion support * feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion * feat: bump DeeperSpeed version to include pipeline logging fix * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* Add DeepSpeed MoE Thanks to dayofthepenguin for extensive testing Closes #479 * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update requirements.txt * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

#1176)

- Eliminate already installed apt packages - sparse attn requirement lead to a triton downgrade - flash attn is already part of the ngc container (in another version that is compatible with TE)

…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results. (#1178)

Fixes #1165 Co-authored-by: Yang Zhang <[email protected]>

Fixes #1174 Co-authored-by: Yang Zhang <[email protected]>

* initial mamba support (no kernels, no parallelism) * Mamba runs! Also, add flags for sel. scan and conv1d fused kernels * Update NeoXArgs docs automatically * add mamba_inner_fn ; try really hard to make A_log and D no-WD and stored in fp32 * cleanup print statements * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add draft conversion script (tested working TP=1) * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * update parallelism checks for mamba--partition activations works * add mamba requirements * clean up and better comment mamba code * clean up and better comment mamba code * update arg validation in mamba * more cleanup * add flag for fp32 Alog/D, add init_methods support for mamba * Update NeoXArgs docs automatically * update conversion script name, add docstring * name conversion script * Update NeoXArgs docs automatically * add demo configs * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add arguments to control conv and (in,out)_proj biases in mamba separately * Update NeoXArgs docs automatically * make x_proj bias also controlled by flag * Update NeoXArgs docs automatically * pre-commit, add comments * Update NeoXArgs docs automatically * Add mamba import print * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* add cuda support for flash attn w/ alibi, warn of deprecation of triton * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* TP works! * merge TP mamba changes with most current MambaLayer * cleanup TP, confirmed working still * make shapes with TP>1 work with conversion * tested and PP works, so no need for assert blocking it in arguments * update comment * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* added ds zero.Init() to get_model * Clean up conditional with block * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

ENH Small typo in the README

* making PR triggered CPU test for changes to megatron * Update NeoXArgs docs automatically * pre-commit * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* initial JIT load functions * passing neox_arge to load() as optional for easy testing * modified headers for correct copyright statements

… init (#1191) * added ds zero.Init() to get_model * Clean up conditional with block * pre-commit * ensured deepspeed configs are passed to init --------- Co-authored-by: Quentin Anthony <[email protected]>

* Fixes a weird typo * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v4.36.0...v4.38.0) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* split PR and CPU tests into separate work; adjust references to env variables in workflow * tweaking to pull compose file from CPU test dir * adding post-cleanup for portability; adding workflow_dispatch to test * fixing mapping * forgot shell declaration in composite run * make sure all steps run even if first CPU tests fail * adding workflow dispatch to manually call workflow; removing httpserver * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Add megablocks dropless MoE * pre-commit --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

…_size (#1209) In tools/ckpts/convert_neox_to_hf.py, for neox architecture the 'intermediate_size' argument is not explicitly set, so it defaults to 24576 from: https://github.com/huggingface/transformers/blob/9fe3f585bb4ea29f209dc705d269fbe292e1128f/src/transformers/models/gpt_neox/configuration_gpt_neox.py#L48 Proposed solution: set intermediate-size to 4 * hidden-size

* add rwkv support * Update init_functions.py * rwkv model files * configs * kernels * Cleanup * Update 760M.yml * remove preffn and mishglu * Update NeoXArgs docs automatically * Add RWKV parallelism assertions * Update NeoXArgs docs automatically * pre-commit and config cleanup * Update NeoXArgs docs automatically * rwkv logging * Update NeoXArgs docs automatically * Add rwkv version dirname, make hdim 3.5x * pre-commit * Update NeoXArgs docs automatically * fix bug and set batch size to 32 * Update NeoXArgs docs automatically --------- Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: github-actions <[email protected]>

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.3...3.1.4) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* misc changes to neox_args * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* misc changes to neox_args * allow rwkv pp * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* format: flagged on pre-commit * feat: add pytorch profiling * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Tolerate no fused kernels * Fix requirements file syntax * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update README.md * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* add workflow_dispatch to gh actions pr so we can run on command * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* init changes to README * Update NeoXArgs docs automatically * Update README.md * Update NeoXArgs docs automatically * Update README.md * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Fix changed behavior of pipe_parallel * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* update is_pipe_parallel logic ; handle tied-embeddings case correctly * Update NeoXArgs docs automatically * revert PP to be consistent * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* fix python version and pytest install * Update NeoXArgs docs automatically * python3 * Update NeoXArgs docs automatically * pip not pip3 * Update NeoXArgs docs automatically * python3 pip * Update NeoXArgs docs automatically * python3 -m pip * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add docker setup to workflow * Update NeoXArgs docs automatically * python setup * Update NeoXArgs docs automatically * python setup v2 * Update NeoXArgs docs automatically * python setup v3 * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Add hash back to deep speed version * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Add a chat data preprocessing script * add EOT at end of a chat * update README.md * apply pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

Commits on Nov 5, 2023

Update README.md

StellaAthena authored Nov 5, 2023

Configuration menu

View commit details

Copy full SHA for 90aa131

Browse repository at this point

Copy the full SHA

90aa131 View commit details

Browse the repository at this point in the history

Commits on Nov 30, 2023

Update README.md

StellaAthena authored Nov 30, 2023

Configuration menu

View commit details

Copy full SHA for efea81f

Browse repository at this point

Copy the full SHA

efea81f View commit details

Browse the repository at this point in the history

Commits on Jan 30, 2024

moved eval import and added to docs (#1139 )

R0n12 authored Jan 30, 2024

Configuration menu

View commit details

Copy full SHA for 1c133bf

Browse repository at this point

Copy the full SHA

1c133bf View commit details

Browse the repository at this point in the history

[pull] main from EleutherAI:main #2

[pull] main from EleutherAI:main #2

Commits on Oct 31, 2023

Commits on Nov 1, 2023

Commits on Nov 5, 2023

Commits on Nov 7, 2023

Commits on Nov 8, 2023

Commits on Nov 16, 2023

Commits on Nov 22, 2023

Commits on Nov 30, 2023

Commits on Dec 4, 2023

Commits on Dec 6, 2023

Commits on Dec 19, 2023

Commits on Dec 20, 2023

Commits on Dec 22, 2023

Commits on Dec 23, 2023

Commits on Dec 26, 2023

Commits on Jan 4, 2024

Commits on Jan 5, 2024

Commits on Jan 9, 2024

Commits on Jan 11, 2024

Commits on Jan 13, 2024

Commits on Jan 19, 2024

Commits on Jan 24, 2024

Commits on Jan 26, 2024

Commits on Jan 29, 2024

Commits on Jan 30, 2024

Commits on Feb 1, 2024

Commits on Feb 5, 2024

Commits on Feb 8, 2024

Commits on Feb 21, 2024

Commits on Feb 23, 2024

Commits on Feb 26, 2024

Commits on Feb 29, 2024

Commits on Mar 2, 2024

Commits on Mar 4, 2024

Commits on Mar 7, 2024

Commits on Mar 8, 2024

Commits on Mar 10, 2024

Commits on Mar 13, 2024

Commits on Mar 15, 2024

Commits on Mar 19, 2024

Commits on Mar 26, 2024

Commits on Apr 1, 2024

Commits on Apr 24, 2024

Commits on Apr 25, 2024

Commits on May 4, 2024

Commits on May 6, 2024

Commits on May 13, 2024

Commits on May 16, 2024

Commits on May 21, 2024

Commits on May 26, 2024

Commits on Jun 4, 2024

Commits on Jun 5, 2024

Commits on Jun 7, 2024

Commits on Jun 19, 2024

Commits on Jun 25, 2024

Commits on Jun 28, 2024