-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Why has the performance of d3 improved so much? #1
Comments
Thanks for your interest in our work. I think the main reason is we have constructed some negative queries during training visual grounding data, which is described in the last paragraph of Sec. 3.2.
We also construct |
By mentioning "reject" and "negative", do you mean that techniques like contrastive learning are used? If not, then I am a bit confused. Because, intuively, concatenating the postive language queries (describing objects in the images) with the negative ones (describing objects that don't exist) and then making it interact with visual features, is like introducing noise in the features, right? Without contrastive loss or other manipulation, how could the model explicitly learn to reject irrelevant prompts, and get higher performance? Please correct me if I am misunderstanding. |
We believe the model will learn to denoise as we use noisy tokens for fusion and supervise it with ground-truth signals. As we formulate grounding as detection, all prompts can be seen as object classes. |
This is a fantastic job, and I have a question: why has the performance of d3's dataset improved so much? It seems relatively reasonable for other datasets to show improvement. I look forward to your response.
The text was updated successfully, but these errors were encountered: