You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, here is a key: "mm_vision_tower_lr": 2e-06," in model's config.json file, and in the LLaVA-NEXT blog on May 25th, the vision tower are training during stage-2 with lr=2e-6.
Were the previous models trained according to this strategy? Will training CLIP be better when training for downstream task?
The text was updated successfully, but these errors were encountered:
I find
@torch.no_grad()
in CLIPVisionTower.forward(), so it won't flow gradient to CLIP while training.LLaVA/llava/model/multimodal_encoder/clip_encoder.py
Lines 45 to 57 in c121f04
However, here is a key:
"mm_vision_tower_lr": 2e-06,"
in model'sconfig.json
file, and in the LLaVA-NEXT blog on May 25th, the vision tower are training during stage-2 with lr=2e-6.Were the previous models trained according to this strategy? Will training CLIP be better when training for downstream task?
The text was updated successfully, but these errors were encountered: