Batch Inference using Ray and vLLM #680

ratnopamc · 2024-10-23T05:00:55Z

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Current examples mostly showcase online inference.
We need to add example of batch inference on Ray with vLLM.

Add example of batch inference using RayJob under the JARK-stack blueprint (for GPUs).
Refer to Ray Documentation for an example.

thangalv · 2024-10-23T16:55:58Z

Interested in working on this issue

ratnopamc · 2024-10-24T19:13:30Z

Thanks @thangalv, assigned to you. Once you're done testing your changes, please raise a PR.

askulkarni2 added the enhancement New feature or request label Oct 23, 2024

ratnopamc assigned thangalv Oct 24, 2024

Provide feedback