Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the conv-mode for LLaVA-NeXT-Video-7B-32K #54

Closed
rebuttalpapers opened this issue Jun 6, 2024 · 5 comments
Closed

What is the conv-mode for LLaVA-NeXT-Video-7B-32K #54

rebuttalpapers opened this issue Jun 6, 2024 · 5 comments

Comments

@rebuttalpapers
Copy link

I tried the following conv-mode:
vicuna_v1
--conv-mode mistral_direct
Llava_llama_2
llama_2
mistral_instruct

and encounter the error as below:
AttributeError: 'LlavaMistralConfig' object has no attribute 'attention_bias'

@ZhangYuanhan-AI
Copy link
Collaborator

Hi, what is the version of your transformers?

@rebuttalpapers
Copy link
Author

rebuttalpapers commented Jun 7, 2024

Thanks!

import transformers
print(transformers.version)

4.40.0.dev0

@ZhangYuanhan-AI
Copy link
Collaborator

ZhangYuanhan-AI commented Jun 9, 2024

My TF version is 4.39.0 and the conv model should be mistral_instruct for LLaVA-NeXT-Video-7B-32K.

BTW, there is "attention_dropout" in "https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-7B-32K/blob/main/config.json"

@rebuttalpapers
Copy link
Author

rebuttalpapers commented Jun 10, 2024

Thanks @ZhangYuanhan-AI !

  1. What is the "attention_dropout" and how it solves the problem above "AttributeError: 'LlavaMistralConfig' object has no attribute 'attention_bias'"?

  2. Additionally, I downgraded the TF from 4.40.0.dev0 to 4.39.0 and the same problem is still there.

  3. what is the command you use to call LLaVA-NeXT-Video-7B-32K model?
    (for others it is: bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 32 2 True ./data/llava_video/video-chatgpt/evaluation/Test_Videos/v_Lf_7RurLgp0.mp4)

  4. I added the following 3 lines for config, not sure whether they are correct or not:
    setattr(cfg_pretrained, 'attention_bias', 0)
    setattr(cfg_pretrained, 'rope_scaling', {"factor": 8.0, "type": "linear"})
    setattr(cfg_pretrained, 'pretraining_tp', 1)
    However, it is not giving any response

Time taken for inference: 2.013814687728882 seconds
Question: [INST]
Please provide a detailed description of the video, focusing on the main subjects, their actions, and the background scenes [/INST]

Response:

@ZhangYuanhan-AI
Copy link
Collaborator

output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities="video", do_sample=True, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=[stopping_criteria])

try to change this line to

output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities="video", do_sample=True, temperature=0.2, max_new_tokens=1024, use_cache=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants