Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison and Quantization Possibilities between vLLM and TensorRT for OpenChat 3.5 #106

Open
tsathya98 opened this issue Dec 1, 2023 · 0 comments

Comments

@tsathya98
Copy link

Hello OpenChat Team,

First and foremost, I would like to express my sincere appreciation for your work on OpenChat 3.5. It's been a go-to model for my projects, and I'm truly impressed by its functionality and performance.

I'm reaching out with a couple of queries related to model optimization for OpenChat 3.5, particularly in the context of vLLM and TensorRT. The README.md notes the use of vLLM for optimizing the API server, which sparked my interest in a deeper comparison.

My primary question is:

  • Has there been any detailed performance comparison between vLLM and TensorRT for the OpenChat 3.5 model? I'm keen on understanding their relative efficiencies and capabilities in practical scenarios.

Additionally, I'm exploring the possibility of model quantization:

  • Is there a method to quantize the OpenChat 3.5 model to FP16 or bF16, and then utilize it with vLLM? If so, has anyone undertaken this process or can provide guidance on how to approach it?

Your insights or directions towards any relevant benchmarks, studies, or documentation would be immensely helpful. As someone who is still exploring LLMs and their optimization techniques, this information is crucial for my ongoing projects and understanding of these technologies.

Thank you for your time and the remarkable effort put into this project.

Best regards,
Sathya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant