Merge inference into BertMLM_fix #2

xinhaoc · 2024-05-10T06:10:37Z

Description of changes:

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

Before merging:

Did you update the flexflow-third-party repo, if modifying any of the Cmake files, the build configs, or the submodules?

* fix hip_rocm build with sentencepiece * shellcheck 1 * shellcheck 2 * shellecheck 3 * fix install script * .github/workflows/helpers/install_dependencies.sh * fix * shellcheck * restore unnecessary changes * fix build * removed outdated test from c++ tests * update link in readme

* implemented file-based configs, remove spec_pipeline folder * fix * add inference test, script to downlaod weights * update readme * update ci scripts * newlines * fix gpu-ci * fix * fix * update test file * added incr decoding program, moved LLAMA folder from examples * linting * add incremental decoding to test * update readme * add script to download opt weights * fix support for opt, move code to root inference folder * linting * update test file * fix * bug fix * update test

…exflow#736) * making TreeIncMultiHeadSelfAttentionMeta a subclass of IncMultiHeadSelfAttentionMeta * make BeamSearchIncMultiHeadAttentionMeta a subclass of IncMultiHeadAttentionMeta * format * merging kernel functions * merge more functions * merge compute_qkv_kernel * format * fix config --------- Co-authored-by: xinhaoc <[email protected]>

* fix alignment bugs (part 1) * add missing file

…ttention (flexflow#737) * making TreeIncMultiHeadSelfAttentionMeta a subclass of IncMultiHeadSelfAttentionMeta * make BeamSearchIncMultiHeadAttentionMeta a subclass of IncMultiHeadAttentionMeta --------- Co-authored-by: xinhaoc <[email protected]>

* save output to file * add alignment tests * fix * change conflicting name, add comments * fix typo * formatting * more comments and clean dead code * formatting * fixed issue with length mismatch * fix ci skip * update inf test * add precision selection support in incr decoding

* Update README.md * update readme * fix

…d tests (flexflow#749) * add support for downloading mixed precision llama/opt weights * fix * update test script to also run half precision tests * disable workflow for inference PRs * add verbose option * linting * copy opt weights in download weights script * add alignment tests with huggingface (llama) * fix, add diff to test script * fix * add opt tests * comment out tests not passing * add e2e latency to output files * add speed tests * shellcheck * shellcheck * fix * fix * linting * fix

* Add support for login information with multiple ssms. * Update prepare_next_batch_verify. * Add dedup tree merge. * Format. * Fix bugs. * Runs with mutilmodels. * Fix. * Format * Fix. * Fix increamental decoding. * fix use_full_precision issue.

* fix * fix workflow

* Fix bug in elementwise multiplication with broadcasting (flexflow#764) * Fix multinode test (flexflow#766) * Fix UCX multinode test (flexflow#768) * fix * fix 2 * Prevent format.sh from formatting triton (flexflow#756) * [CI] - Increase timeout in multinode test (UCX & MPI) (flexflow#773) * fix * fix 2 * increase timeout * Fix docker builds in CI (flexflow#774) --------- Co-authored-by: Soumya Chatterjee <[email protected]> Co-authored-by: Colin Unger <[email protected]>

* init * add mlc tokenizer. * . * fix * fix pipeline, fix name * . * format * ci * . * add rust * fix * . * inf test fix * . * fix * . * fix * optimize * move rust to conda env * . * . * fix * fix * fix * update git ignore * fix rust install * Update config.linux --------- Co-authored-by: Gabriele Oliaro <[email protected]>

* fix gpu-ci * add check for rust in cmake

* decomp * initial implementation * add missing file * checkpoint * more bug fixes * update default offload size * fix non-offload * undo changes to spec_inc_mha * fix a parallel tensor reuse bug * prepare_next_batch for offload(inc_decode) * format * int4&int8 offload * fix merge issue * fix build * spec_infer offload&quantize * fix, update readme. * remove redundant * hip build * hip * model param --------- Co-authored-by: xinhaoc <[email protected]>

* add parallel operators * add cmd line param * setting machine views * move bias blocks * comment out print of partitions * add unimplemented methods * add impl of inference functions to replicate and reduce ops * replicate bias in file loader * fixes, now works * only add bias once * load and use weights according to partition * fix wout weight * cleanup * add support for mixed precision in parallel ops * cleanup * rocm build fix * hip rocm fix 2 * fix machine views * fix rocm build * adjust numbe of pipeline stages * add model parallelism to opt linear layers * fix * fxi multi gpu test * fix * add tensor parallelism tests to inference test script * enable tensor parallelism for dense layers in llama * fix * fix set_tensor-related issues * fix and linting

* Docker-build and Publish Modification **Description of changes:** Add code in docker-build.yml that allows automatic build and publish process when push happens to inference branch. Moreover, modifies publish.sh so that image name will be created as "image" and "branch" name to distinguish from those created in master branch. **Related Issues:** Linked Issues: - Issue # Issues closed by this PR: - Closes # **Before merging:** - [ ] Did you update the [flexflow-third-party](https://github.com/flexflow/flexflow-third-party) repo, if modifying any of the Cmake files, the build configs, or the submodules? * update container name * specinfer env publish * tag specinfer * add spaces * newline * fix * fix gpu ci workflow --------- Co-authored-by: Gabriele Oliaro <[email protected]>

* fix linear region requirement * fix set tensor issue

* only stop server if rm is initialized * fix * better logging * pass layer names to ops * add debugging functionality to hf script * fix * fixes * fix * fix --------- Co-authored-by: Ubuntu <[email protected]>

Co-authored-by: Gabriele Oliaro <[email protected]>

* bug fixes and update Legion version * fix * bug fix * update legion * fix arithmetic error due to num_devices uninitialized * update legion version * update ci * fix * debugging ci * Revert "debugging ci" This reverts commit 0b3148e. --------- Co-authored-by: Gabriele Oliaro <[email protected]>

…w#1246) * add a background server for RequestManager * . * make incr_decoding work * make spec_infer work * format * update python inference * fix python issues * bug fix * add a Legion future to capture the termination of the background server * gradio finished * chatbot gradio version 2 * chainlit1 * chainlit2 * fastapi done * fastapi incr_decoding * langchain example & wrapper class * langchain example & wrapper class1 * added documentation * entrypoint * del apikey * delete extra files * rag search fixed some bugs * fixed rag search issues * updates before rebase * minor changes * reorganize files * Add thread safety for background server. * Simplify backend server design. * resolve conflict. * specinfer usecases with issues labeled * specinfer usecases with issues labeled 2 * fixed issues with prompt template * fix issues with rag specinfer * Add server task timeout. * register callbacks to terminate background worker at exit or termination * [Python] enable decoding multiple requests * update README.md and default configuration * fix issues with gradio and prompt template * fix issues with rag * adjusted fastapi entrypoint * update documentation * resole conflicts * issues fix * adjustments on usecases and api entrypoints * remove redundent changes * testing CI * Enable backtrace * restore newlines * version * add back misdeleted line * legion verion --------- Co-authored-by: Zhihao Jia <[email protected]> Co-authored-by: Gabriele Oliaro <[email protected]> Co-authored-by: zwang86 <[email protected]> Co-authored-by: Zeyu Wang <[email protected]> Co-authored-by: xinhaoc <[email protected]>

* bug fixes and update Legion version * fix * bug fix * update legion * fix arithmetic error due to num_devices uninitialized * update legion version * update ci * fix * debugging ci * Revert "debugging ci" This reverts commit 0b3148e. * update mapper interface * add ncclFinalize * Only delete nccl communications for training jobs --------- Co-authored-by: Zhihao Jia <[email protected]>

* modify README * fix link issues * update legion version --------- Co-authored-by: Zhihao Jia <[email protected]>

…w#1308)

…lexflow#1318)

* . * remove deadcode * add benchmarking mode, initializing weights randomly * better logging when running out of memory * update --------- Co-authored-by: Gabriele Oliaro <[email protected]>

Co-authored-by: Gabriele Oliaro <[email protected]>

…ence

goliaro and others added 30 commits May 19, 2023 13:41

Update README.md

dc6dcf8

Update README.md

1193b51

[Inference] - Alignment fixes (flexflow#740)

b0a5b9c

* fix alignment bugs (part 1) * add missing file

Update README.md (flexflow#741)

1ab3d80

Update README.md (flexflow#744)

6c13936

* Update README.md * update readme * fix

fix

d8072ab

Merge branch 'inference' into fix_spec

ad75ac9

Fix inference test (flexflow#767)

e131908

* fix * fix workflow

Merge branch 'inference' into fix_spec

eabad2d

Merge branch 'master' into inference

cd0d15f

[Inference] - Fix build issues (flexflow#779)

2fd3d69

* fix gpu-ci * add check for rust in cmake

Merge branch 'inference' into fix_spec

52c3656

Merge branch 'inference' into fix_spec

3efc962

Formatting.

f74377a

Merge branch 'inference' into fix_spec

71782e9

add check for cargo (flexflow#812)

c40c3f1

Merge branch 'inference' into fix_spec

c4337f2

[Inference] - Fix Multiple-GPUs CI test (flexflow#804)

3a87e02

* fix linear region requirement * fix set tensor issue

goliaro and others added 30 commits January 20, 2024 04:21

Better debugging/logging tools for alignment checks (flexflow#1275)

75edadc

* only stop server if rm is initialized * fix * better logging * pass layer names to ops * add debugging functionality to hf script * fix * fixes * fix * fix --------- Co-authored-by: Ubuntu <[email protected]>

Fix incorrect innode being checked (flexflow#1273)

57d1883

Co-authored-by: Gabriele Oliaro <[email protected]>

fix

01c9d4c

fix

a31f8e9

Revert "Bug fixes and update Legion version" (flexflow#1286)

d73bba1

tp

9141c46

Docs Modification for Python Usecases (flexflow#1291)

be28d71

* modify README * fix link issues * update legion version --------- Co-authored-by: Zhihao Jia <[email protected]>

timer

8185289

rmv

d958805

Add support for docker machines with cuda 12.1 and cuda 12.2 (flexflo…

e24eb03

…w#1308)

fix tp

38dfd87

try a fix

355d4b4

Fix NCCL tear down issue, update docker pre-build cuda version list (f…

0d75c10

…lexflow#1318)

add expansion config param in specinfer

ea31426

parametrize max_spec_tree_token_num

e03dec0

fix

c856680

fix

8d82c91

fix

0479a64

run CI per commit only on inference branch

5bd7123

fix

e0a6e4f

fix: 'model_configs' AttributeError (flexflow#1358)

1210256

Changes to support Perlmutter environment (flexflow#1360)

b4a639c

* . * remove deadcode * add benchmarking mode, initializing weights randomly * better logging when running out of memory * update --------- Co-authored-by: Gabriele Oliaro <[email protected]>

update workflow to build rocm docker images

7da197e

downgrade to python 3.11 for now

002fdf0

doc: fix c++ serving example (flexflow#1372)

d54e4b6

Co-authored-by: Gabriele Oliaro <[email protected]>

Merge remote-tracking branch 'xinhao/xinhao_candle' into xinhao_infer…

024d188

…ence

Merge branch 'bert_fix1' into xinhao_inference

0f8b5f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge inference into BertMLM_fix #2

Merge inference into BertMLM_fix #2

xinhaoc commented May 10, 2024

Merge inference into BertMLM_fix #2

Are you sure you want to change the base?

Merge inference into BertMLM_fix #2

Conversation

xinhaoc commented May 10, 2024