Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge inference into BertMLM_fix #3

Open
wants to merge 381 commits into
base: bert_fix1
Choose a base branch
from
Open

Conversation

xinhaoc
Copy link
Owner

@xinhaoc xinhaoc commented May 17, 2024

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

Before merging:

  • Did you update the flexflow-third-party repo, if modifying any of the Cmake files, the build configs, or the submodules?

jiazhihao and others added 30 commits May 10, 2023 17:57
* Support multiple FFModels in a single top_level_task

* [TreeVerifyMHA] bug fixes
* init

* fix

* code

* clean up

* fix

* fix, add md

* format

* hip_roc

* add comment
* Support multiple FFModels in a single top_level_task

* [TreeVerifyMHA] bug fixes

* bug fixes

* TreeIncMHA and SpecIncMHA bug fixes

* fomat.

---------

Co-authored-by: xinhaoc <[email protected]>
* serving opt pipeline

* format
Co-authored-by: Zhihao Jia <[email protected]>
* complex into metadata

* topk

* format

---------

Co-authored-by: Zhihao Jia <[email protected]>
* Support multiple FFModels in a single top_level_task

* [TreeVerifyMHA] bug fixes

* bug fixes

* TreeIncMHA and SpecIncMHA bug fixes

* fomat.

* .

* add sentence piece tokenizer

* format

* prepare spec_infer demo

* prettier prints

* make the llama model work

* add small model config

* enable speculative inference for spec_infer

* fix

* rename

* fix one of the bugs

* fix

* del

* attempt to fix ci

* integrated gpt/opt tokenizer

* integrate opt tokenizer with pipeline

* .

* format

* move files

* Update README.md

* add an overview figure

* update images

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* add tokenizer in readme

* fix

* fix

* fix

* Update README.md

* Update README.md

* add gif

* add weights to readme, clean some print

* Update README.md

* update demo

* Update README.md

* Update README.md

* remove outdate file

* Update README.md

* Update README.md

* .

---------

Co-authored-by: xinhaoc <[email protected]>
Co-authored-by: Gabriele Oliaro <[email protected]>
Co-authored-by: xinhaoc <[email protected]>
* Support multiple FFModels in a single top_level_task

* [TreeVerifyMHA] bug fixes

* bug fixes

* TreeIncMHA and SpecIncMHA bug fixes

* fomat.

* .

* add sentence piece tokenizer

* format

* prepare spec_infer demo

* prettier prints

* make the llama model work

* add small model config

* enable speculative inference for spec_infer

* fix

* rename

* fix one of the bugs

* fix

* del

* attempt to fix ci

* integrated gpt/opt tokenizer

* integrate opt tokenizer with pipeline

* .

* format

* move files

* Update README.md

* add an overview figure

* update images

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* add tokenizer in readme

* fix

* fix

* fix

* Update README.md

* Update README.md

* add gif

* add weights to readme, clean some print

* Update README.md

* update demo

* Update README.md

* Update README.md

* remove outdate file

* Update README.md

* Update README.md

* .

* use data parallel by default

---------

Co-authored-by: xinhaoc <[email protected]>
Co-authored-by: Gabriele Oliaro <[email protected]>
Co-authored-by: xinhaoc <[email protected]>
* file path adapt

* fix

* fix

* fix
* fix hip_rocm build with sentencepiece

* shellcheck 1

* shellcheck 2

* shellecheck 3

* fix install script

* .github/workflows/helpers/install_dependencies.sh

* fix

* shellcheck

* restore unnecessary changes

* fix build

* removed outdated test from c++ tests

* update link in readme
* implemented file-based configs, remove spec_pipeline folder

* fix

* add inference test, script to downlaod weights

* update readme

* update ci scripts

* newlines

* fix gpu-ci

* fix

* fix

* update test file

* added incr decoding program, moved LLAMA folder from examples

* linting

* add incremental decoding to test

* update readme

* add script to download opt weights

* fix support for opt, move code to root inference folder

* linting

* update test file

* fix

* bug fix

* update test
…exflow#736)

* making TreeIncMultiHeadSelfAttentionMeta a subclass of IncMultiHeadSelfAttentionMeta

* make BeamSearchIncMultiHeadAttentionMeta a subclass of IncMultiHeadAttentionMeta

* format

* merging kernel functions

* merge more functions

* merge compute_qkv_kernel

* format

* fix config

---------

Co-authored-by: xinhaoc <[email protected]>
* fix alignment bugs (part 1)

* add missing file
…ttention (flexflow#737)

* making TreeIncMultiHeadSelfAttentionMeta a subclass of IncMultiHeadSelfAttentionMeta

* make BeamSearchIncMultiHeadAttentionMeta a subclass of IncMultiHeadAttentionMeta

---------

Co-authored-by: xinhaoc <[email protected]>
* save output to file

* add alignment tests

* fix

* change conflicting name, add comments

* fix typo

* formatting

* more comments and clean dead code

* formatting

* fixed issue with length mismatch

* fix ci skip

* update inf test

* add precision selection support in incr decoding
* Update README.md

* update readme

* fix
…d tests (flexflow#749)

* add support for downloading mixed precision llama/opt weights

* fix

* update test script to also run half precision tests

* disable workflow for inference PRs

* add verbose option

* linting

* copy opt weights in download weights script

* add alignment tests with huggingface (llama)

* fix, add diff to test script

* fix

* add opt tests

* comment out tests not passing

* add e2e latency to output files

* add speed tests

* shellcheck

* shellcheck

* fix

* fix

* linting

* fix
goliaro and others added 28 commits January 9, 2024 06:56
This reverts commit 197e308.
* add a background server for RequestManager

* .

* make incr_decoding work

* make spec_infer work

* format

* update python inference

* fix python issues

* bug fix

* add a Legion future to capture the termination of the background server

* Add thread safety for background server.

* Simplify backend server design.

* resolve conflict.

* Add server task timeout.

* register callbacks to terminate background worker at exit or termination

* [Python] enable decoding multiple requests

* update README.md and default configuration

* [Python] no need to use the llm context environment to start/stop the background server

* require at least four cpu cores

* [Python] add back explict start_server()/stop_server().

* fix

* fix python chatgpt.json

---------

Co-authored-by: Gabriele Oliaro <[email protected]>
Co-authored-by: zwang86 <[email protected]>
Co-authored-by: Zeyu Wang <[email protected]>
Co-authored-by: xinhaoc <[email protected]>
* only stop server if rm is initialized

* fix

* better logging

* pass layer names to ops

* add debugging functionality to hf script

* fix

* fixes

* fix

* fix

---------

Co-authored-by: Ubuntu <[email protected]>
* bug fixes and update Legion version

* fix

* bug fix

* update legion

* fix arithmetic error due to num_devices uninitialized

* update legion version

* update ci

* fix

* debugging ci

* Revert "debugging ci"

This reverts commit 0b3148e.

---------

Co-authored-by: Gabriele Oliaro <[email protected]>
…w#1246)

* add a background server for RequestManager

* .

* make incr_decoding work

* make spec_infer work

* format

* update python inference

* fix python issues

* bug fix

* add a Legion future to capture the termination of the background server

* gradio finished

* chatbot gradio version 2

* chainlit1

* chainlit2

* fastapi done

* fastapi incr_decoding

* langchain example & wrapper class

* langchain example & wrapper class1

* added documentation

* entrypoint

* del apikey

* delete extra files

* rag search fixed some bugs

* fixed rag search issues

* updates before rebase

* minor changes

* reorganize files

* Add thread safety for background server.

* Simplify backend server design.

* resolve conflict.

* specinfer usecases with issues labeled

* specinfer usecases with issues labeled 2

* fixed issues with prompt template

* fix issues with rag specinfer

* Add server task timeout.

* register callbacks to terminate background worker at exit or termination

* [Python] enable decoding multiple requests

* update README.md and default configuration

* fix issues with gradio and prompt template

* fix issues with rag

* adjusted fastapi entrypoint

* update documentation

* resole conflicts

* issues fix

* adjustments on usecases and api entrypoints

* remove redundent changes

* testing CI

* Enable backtrace

* restore newlines

* version

* add back misdeleted line

* legion verion

---------

Co-authored-by: Zhihao Jia <[email protected]>
Co-authored-by: Gabriele Oliaro <[email protected]>
Co-authored-by: zwang86 <[email protected]>
Co-authored-by: Zeyu Wang <[email protected]>
Co-authored-by: xinhaoc <[email protected]>
* bug fixes and update Legion version

* fix

* bug fix

* update legion

* fix arithmetic error due to num_devices uninitialized

* update legion version

* update ci

* fix

* debugging ci

* Revert "debugging ci"

This reverts commit 0b3148e.

* update mapper interface

* add ncclFinalize

* Only delete nccl communications for training jobs

---------

Co-authored-by: Zhihao Jia <[email protected]>
* modify README

* fix link issues

* update legion version

---------

Co-authored-by: Zhihao Jia <[email protected]>
* .

* remove deadcode

* add benchmarking mode, initializing weights randomly

* better logging when running out of memory

* update

---------

Co-authored-by: Gabriele Oliaro <[email protected]>
@xinhaoc xinhaoc changed the title Xinhao inference merge inference into BertMLM_fix May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet