New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve, vllm] add vllm-example that we can reference to #36617

Merged

scv119 merged 13 commits into master from vllm

Jun 21, 2023

Contributor

scv119 commented Jun 20, 2023 •

edited

Loading

Why are these changes needed?

This adds a vllm example on serve that we can reference to.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

scv119 added 2 commits

June 20, 2023 14:31


add vllm-example

913a737


update

41ef06d

scv119 marked this pull request as ready for review

June 20, 2023 21:41

scv119 requested review from edoakes, shrekris-anyscale, sihanwang41, zcin, architkulkarni and a team as code owners

June 20, 2023 21:41

shrekris-anyscale reviewed

View reviewed changes

Contributor

shrekris-anyscale left a comment

Nice work so far! I left some comments.

@edoakes this doc uses a GPU. Is there anything special we need to do to give it access to one in the CI?

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Outdated

+if __name__ == "__main__":
+ deployment = VLLMPredictDeployment.bind(model="facebook/opt-125m")
+ serve.run(deployment)
+ send_request()

Contributor

shrekris-anyscale Jun 20, 2023

Do you need to call print on send_request()?

Contributor

shrekris-anyscale Jun 20, 2023

Can you assert what the user should see when they run send_request()?

doc/source/serve/doc_code/vllm_example.py Outdated

+ prompt = final_output.prompt
+ text_outputs = [
+ prompt + output.text
+ for output in final_output.outputs

Contributor

shrekris-anyscale Jun 20, 2023

To confirm, does final_output.outputs accumulate all the generated outputs? It doesn't just contain the latest output, correct?

doc/source/serve/doc_code/vllm_example.py Outdated

+ prompt + output.text
+ for output in request_output.outputs
+ ]
+ ret = {"text": text_outputs}

Contributor

shrekris-anyscale Jun 20, 2023

Can we return only the generated output instead of prepending the prompt each time? That way, the user can see a single, coherent piece of text streamed back to them.

edoakes reviewed

View reviewed changes

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved

cadedaniel reviewed

View reviewed changes

doc/source/serve/doc_code/vllm_example.py Outdated

+ for output in request_output.outputs
+ ]
+ ret = {"text": text_outputs}
+ yield (json.dumps(ret) + "\0").encode("utf-8")

Member

cadedaniel Jun 20, 2023

this null byte is not necessary and we should remove

scv119 added 3 commits

June 20, 2023 15:53


update

ad71eb7


update

7252ee0


update

dddd831

cadedaniel reviewed

View reviewed changes

doc/source/serve/doc_code/vllm_example.py

+ # GPU: compute capability 7.0 or higher (e.g., V100, T4, RTX20xx, A100, L4, etc.)
+ # see https://vllm.readthedocs.io/en/latest/getting_started/installation.html
+ # for more details.
+ deployment = VLLMPredictDeployment.bind(model="facebook/opt-125m")

Member

cadedaniel Jun 20, 2023

add port arg?

Member

cadedaniel Jun 20, 2023

or maybe a README/doc update that gives usage example?

Member

cadedaniel Jun 21, 2023

we can revisit this when we write docs in #36650

scv119 added 2 commits

June 20, 2023 17:13


update

c95e211


update

294a7c3

shrekris-anyscale reviewed

View reviewed changes

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved

scv119 and others added 2 commits

June 20, 2023 18:06


Update doc/source/serve/doc_code/vllm_example.py

b855c08

Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: Chen Shen <[email protected]>


Update doc/source/serve/doc_code/vllm_example.py

bc28f4d

Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: Chen Shen <[email protected]>

shrekris-anyscale approved these changes

View reviewed changes

Contributor

shrekris-anyscale left a comment

Nice work, this change looks good to me!

scv119 added 2 commits

June 20, 2023 23:32


update

8ea2f16


update

16d81aa

edoakes approved these changes

View reviewed changes

richardliaw approved these changes

View reviewed changes

scv119 added 2 commits

June 21, 2023 09:55


update

332fd5f


Merge remote-tracking branch 'upstream/master' into vllm

3d3183d

Contributor Author

scv119 commented Jun 21, 2023

this PR is time sensitive and the CI is slow. I'll merge it and fix CI later.

scv119 merged commit cc983fc into master

1 of 2 checks passed

scv119 deleted the vllm branch

June 21, 2023 18:19

aslonnie added a commit that referenced this pull request


Revert "[serve, vllm] add vllm-example that we can reference to (#36617…

3e180d9

…)"

This reverts commit cc983fc.

LosSherl commented Jul 3, 2023

Called ray.init twice when vllm tensor_parallel_size > 1

SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request


[serve, vllm] add vllm-example that we can reference to (ray-project#…

436cdec

…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: 久龙 <[email protected]>

akshay-anyscale mentioned this pull request

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

harborn pushed a commit to harborn/ray that referenced this pull request


[serve, vllm] add vllm-example that we can reference to (ray-project#…

3d6e918

…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: harborn <[email protected]>

harborn pushed a commit to harborn/ray that referenced this pull request


[serve, vllm] add vllm-example that we can reference to (ray-project#…

88c60f8

…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>

arvind-chandra pushed a commit to lmco/ray that referenced this pull request


[serve, vllm] add vllm-example that we can reference to (ray-project#…

9418a21

…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: e428265 <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet