Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does vistion tower trained during starge 2 (Visual Instruction Tuning)? #1537

Open
GoGoJoestar opened this issue Jun 3, 2024 · 2 comments
Open

Comments

@GoGoJoestar
Copy link

I find @torch.no_grad() in CLIPVisionTower.forward(), so it won't flow gradient to CLIP while training.

@torch.no_grad()
def forward(self, images):
if type(images) is list:
image_features = []
for image in images:
image_forward_out = self.vision_tower(image.to(device=self.device, dtype=self.dtype).unsqueeze(0), output_hidden_states=True)
image_feature = self.feature_select(image_forward_out).to(image.dtype)
image_features.append(image_feature)
else:
image_forward_outs = self.vision_tower(images.to(device=self.device, dtype=self.dtype), output_hidden_states=True)
image_features = self.feature_select(image_forward_outs).to(images.dtype)
return image_features

However, here is a key: "mm_vision_tower_lr": 2e-06," in model's config.json file, and in the LLaVA-NEXT blog on May 25th, the vision tower are training during stage-2 with lr=2e-6.

Were the previous models trained according to this strategy? Will training CLIP be better when training for downstream task?

@2U1
Copy link

2U1 commented Jun 18, 2024

I think the training code isn't open yet for the LLaVA-NEXT.

@PangziZhang523
Copy link

Print out the gradient,while requires_grad=True,the parameter.grad=None?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants