Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on the "gated cross-modality interaction" #27

Open
Masaaki-75 opened this issue Jan 19, 2024 · 1 comment
Open

Inquiry on the "gated cross-modality interaction" #27

Masaaki-75 opened this issue Jan 19, 2024 · 1 comment

Comments

@Masaaki-75
Copy link

Hi! Thanks for open-sourcing APE, it is fantastic! 👍

I am new to the field of open-vocabulary vision foundation models, and I have some questions on the "gated cross-modality interaction" when going through your paper, hoping to seek your insights on a few points.

I understand that the interaction of image features and text features in GLIP causes expensive computation. But I couldn't get the part of "all-zero token", quoted:

Instead, an all-zero token Pzero serves as a special text embedding and inputs to the fusion module for all given vocabularies. In this situation, the fusion process is “static”, as no language information is injected into vision features. The Pzero could provide explicit instructions to recognize primitive concepts and slightly tune vision feature Vvoc and retain original language feature Pvoc.

@shenyunhang
Copy link
Owner

shenyunhang commented Feb 1, 2024

Sorry for this late response.

  1. As the all-zero token is different from other text tokens, it does not provide any text information, so the model may be awarded to perform OVD and OVS tasks.
  2. we only use this token for vocabulary prompts, but this token can also be used with sentence prompts, which has no effect.
  3. deformable_detr_segm.py is the no-fusion model, fusion model is deformable_detr_segm_vl.py,
    The all-zero token is self.name_prompt_fusion_feature. The corresponding code is here: https://github.com/shenyunhang/APE/blob/main/ape/modeling/ape_deta/deformable_detr_segm_vl.py#L158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants