Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Feature】MMDetection supports Grounding-DINO inference and fine-tuning #228

Open
hhaAndroid opened this issue Sep 26, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@hhaAndroid
Copy link

Hi All:
MMDetection supports Grounding-DINO inference and fine-tuning for now. The mAP we achieved in our reproduction is higher than the official results. We also provide the results of retraining the R50 model from scratch, which exhibits significantly higher performance than the official implementation.

Installation

cd $MMDETROOT

# source installation
pip install -r requirements/multimodal.txt

# or mim installation
mim install mmdet[multimodal]

NOTE

Grounding DINO utilizes BERT as the language model, which requires access to https://huggingface.co/. If you encounter connection errors due to network access, you can download the required files on a computer with internet access and save them locally. Finally, modify the lang_model_name field in the config to the local path. Please refer to the following code:

from transformers import BertConfig, BertModel
from transformers import AutoTokenizer

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", add_pooling_layer=False, config=config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

config.save_pretrained("your path/bert-base-uncased")
model.save_pretrained("your path/bert-base-uncased")
tokenizer.save_pretrained("your path/bert-base-uncased")

Inference

cd $MMDETROOT

wget https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth

python demo/image_demo.py \
	demo/demo.jpg \
	configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py \
	--weights groundingdino_swint_ogc_mmdet-822d7e9d.pth \
	--texts 'bench . car .'

Results and Models

Model Backbone Style COCO mAP Official COCO mAP Pre-Train Data
Grounding DINO-T Swin-T Zero-shot 48.5 48.4 O365,GoldG,Cap4M
Grounding DINO-T Swin-T Finetune 58.1(+0.9) 57.2 O365,GoldG,Cap4M
Grounding DINO-B Swin-B Zero-shot 56.9 56.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-B Swin-B Finetune 59.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-R50 R50 Scratch 48.9(+0.8) 48.1

Details for https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/grounding_dino/README.md

And we also support GLIP inference and fine-tuning

If you encounter any issues while using it, please feel free to create an issue.

@hhaAndroid hhaAndroid changed the title MMDetection supports Grounding-DINO inference and fine-tuning 【Feature】MMDetection supports Grounding-DINO inference and fine-tuning Sep 26, 2023
@SlongLiu SlongLiu added the enhancement New feature or request label Sep 26, 2023
@PawaritL
Copy link

PawaritL commented Oct 8, 2023

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

@FengheTan9
Copy link

FengheTan9 commented Oct 8, 2023

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

Maybe the text input of GroundingDINO in mmdet fixed categoly (not real text) 😥

@Liquidmasl
Copy link

If you encounter any issues while using it, please feel free to create an issue.

This is amazing, thank you!

Can those models be used with the base groundingdino implementation? the configs look quite different, so i guess not?
Bummer to change the implementation at this point

@25icecreamflavors
Copy link

Can I finetune grounding dino on a prompt? The thing is that there should be these objects in pretraining data, but I would like to add some additional information to get better predictions. Let's say I only want to detect "black cats". The problem is that I have few data samples, so I would like to tune it a little bit with prompt to use pretrained knowledge.

@SoulProficiency
Copy link

hi,What are the minimum equipment requirements of fine-tunning grounddino with coco dataset?(default batch-size=32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants