Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve, vllm] add vllm-example that we can reference to #36617

Merged
merged 13 commits into from
Jun 21, 2023
Merged

[serve, vllm] add vllm-example that we can reference to #36617

merged 13 commits into from
Jun 21, 2023

Conversation

scv119
Copy link
Contributor

@scv119 scv119 commented Jun 20, 2023

Why are these changes needed?

This adds a vllm example on serve that we can reference to.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scv119 scv119 marked this pull request as ready for review June 20, 2023 21:41
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far! I left some comments.

@edoakes this doc uses a GPU. Is there anything special we need to do to give it access to one in the CI?

doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Show resolved Hide resolved
doc/source/serve/doc_code/vllm_example.py Outdated Show resolved Hide resolved
if __name__ == "__main__":
deployment = VLLMPredictDeployment.bind(model="facebook/opt-125m")
serve.run(deployment)
send_request()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to call print on send_request()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert what the user should see when they run send_request()?

prompt = final_output.prompt
text_outputs = [
prompt + output.text
for output in final_output.outputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, does final_output.outputs accumulate all the generated outputs? It doesn't just contain the latest output, correct?

prompt + output.text
for output in request_output.outputs
]
ret = {"text": text_outputs}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return only the generated output instead of prepending the prompt each time? That way, the user can see a single, coherent piece of text streamed back to them.

for output in request_output.outputs
]
ret = {"text": text_outputs}
yield (json.dumps(ret) + "\0").encode("utf-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this null byte is not necessary and we should remove

# GPU: compute capability 7.0 or higher (e.g., V100, T4, RTX20xx, A100, L4, etc.)
# see https://vllm.readthedocs.io/en/latest/getting_started/installation.html
# for more details.
deployment = VLLMPredictDeployment.bind(model="facebook/opt-125m")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add port arg?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe a README/doc update that gives usage example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can revisit this when we write docs in #36650

scv119 and others added 2 commits June 20, 2023 18:06
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: Chen Shen <[email protected]>
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this change looks good to me!

@scv119
Copy link
Contributor Author

scv119 commented Jun 21, 2023

this PR is time sensitive and the CI is slow. I'll merge it and fix CI later.

@scv119 scv119 merged commit cc983fc into master Jun 21, 2023
1 of 2 checks passed
@scv119 scv119 deleted the vllm branch June 21, 2023 18:19
aslonnie added a commit that referenced this pull request Jun 21, 2023
@LosSherl
Copy link

LosSherl commented Jul 3, 2023

Called ray.init twice when vllm tensor_parallel_size > 1

SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request Jul 12, 2023
…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: 久龙 <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…36617)

This adds a vllm example on serve that we can refer to.
---------

Signed-off-by: Chen Shen <[email protected]>
Co-authored-by: shrekris-anyscale <[email protected]>
Signed-off-by: e428265 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants