guide for implementing multimodal models? #501

amosyou · 2024-06-04T08:44:36Z

I read this comment about how to adapt vllm implementations for sglang. I was wondering if there were any pointers on how to implement vision-language models for sglang? I've been reading through llava.py and it's not entirely clear to me what I need to change from a typical huggingface implementation.

Also, I was wondering why there's no check for ForwardMode.PREFILL in the forward function for LlavaLlamaForCausalLM?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guide for implementing multimodal models? #501

guide for implementing multimodal models? #501

amosyou commented Jun 4, 2024

guide for implementing multimodal models? #501

guide for implementing multimodal models? #501

Comments

amosyou commented Jun 4, 2024