Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does llava-next-video deploy only focus on first frames? #510

Open
LetheRiver0 opened this issue Jun 7, 2024 · 2 comments
Open

Does llava-next-video deploy only focus on first frames? #510

LetheRiver0 opened this issue Jun 7, 2024 · 2 comments

Comments

@LetheRiver0
Copy link

LetheRiver0 commented Jun 7, 2024

I'm trying to deploy llava-next-video with sglang, and it can successfully work. But I find it only focus on the first frame of input, like if I input 10 frames, and let model to describe it. And the generation only contains first frame's information. Dose anyone know what happend? Thanks~
Also, where can I print the input token for model? I want to check if all frames are input to model

@AmazDeng
Copy link

AmazDeng commented Jun 8, 2024

I'm having a similar problem to you.

  1. I deployed sglang , and loaded the llava-next-image model, but sglang can only do a single inference. If I do batch inference, for example, batch_size=10, sglang can only reason about the first 5, and the last 5 get stuck and can't be reasoned
  2. 2.I'm trying to load the llava-next-model model for inference, but sglang can't reason the result

@Luodian
Copy link
Contributor

Luodian commented Jul 23, 2024

Indeed, our first version code patch has the mentioned issue. We will send a new PR along with our new models to fix above mentioned issues. Sorry for keep you waiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants