Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicted labels #46

Open
Hasanmog opened this issue Dec 20, 2023 · 18 comments
Open

Predicted labels #46

Hasanmog opened this issue Dec 20, 2023 · 18 comments

Comments

@Hasanmog
Copy link

Hasanmog commented Dec 20, 2023

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine.
I modified the evaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results.
I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good.
I placed label_list containing the categories in the cfg_odvg.py
any idea/tips where the source of the problem could be ?

@BIGBALLON
Copy link
Collaborator

BIGBALLON commented Dec 23, 2023

I'm not sure, but if everything is configured correctly, there may be something wrong with the code,
My two suggestions are: 1. Check whether all parameters of the configuration file are correct. 2. Take a closer look at the evaluation code. There may be problems, but I am not sure.

Or if you have any new findings or logs, can you provide them to analyze the specific problems?

@Qia98
Copy link

Qia98 commented Dec 26, 2023

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified the evaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?

I encountered the same problem.
The mAP I got is low but the bounding boxes are quite good. Have you solved the problem?

@Hasanmog
Copy link
Author

Hasanmog commented Dec 27, 2023

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ?
I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

@BIGBALLON
Copy link
Collaborator

BIGBALLON commented Dec 27, 2023

@longzw1997 Any suggestions? It looks like there may be a small problem but I don't know where it is.

  • The visual results are great, indicating that the training is effective
  • But evaluation mAP (including the predicted label) is low, it may be a problem with label_list or a problem with the evaluation code.

@longzw1997
Copy link
Owner

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

@Hasanmog
Copy link
Author

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

@BIGBALLON
Copy link
Collaborator

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

so the evaluation results normal now?

@Hasanmog
Copy link
Author

Hasanmog commented Dec 27, 2023

no I modified them from the beginning , but still same issue.
I don't know if the problem is from my side or there is actually a bug in the code.
Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good.
So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@SamXiaosheng
Copy link

SamXiaosheng commented Jan 4, 2024

I find that the the evaluation result of coco is the sam whether using groundingdino_swint_ogc.pth or groundingdino_swinb_cogcoor.pth.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.552
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.709
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.610
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.407
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.706
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.638
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920

@Hasanmog
Copy link
Author

Hasanmog commented Jan 6, 2024

@SamXiaosheng , I think it depends on the dataset you're using. If your dataset contains like referring expressions , you will find that Groundingdino_swinb performs better because it was trained on RefCOCO while the swint variant wasn't.

check this table

@Qia98
Copy link

Qia98 commented Jan 8, 2024

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about [email protected] 0.90, map.5 :.95 0.70, but in training log [email protected] no more than 0.1).

@Qia98
Copy link

Qia98 commented Jan 8, 2024

And I suspect that in this code base, the output of the model is correct, but the _res_labels used to calculate the mAP are incorrect, so the problem may arise in the process of converting the output of the model to _res_labels in json file format.
For example, xywh processing is incorrect.

@BIGBALLON
Copy link
Collaborator

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about [email protected] 0.90, map.5 :.95 0.70, but in training log [email protected] no more than 0.1).

Hi, @Qia98 I agree with your viewpoint and, if you find the time, please feel free to create a pull request to address this issue. 😄

@EddieEduardo
Copy link

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

Hi, I also encounterd the same problem, when I visualize the detection results, I found that location of bounding boxes is correct, but the categories are usually incorrect. Is there something wrong with BERT or it is because of other reasons?

@junfengcao
Copy link

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@BIGBALLON
Copy link
Collaborator

BIGBALLON commented Mar 12, 2024

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

@jaychempan
Copy link

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

May I ask if this problem has been solved by anyone? I've encountered the same problem and have troubleshot a number of situations but have not solved the problem.

@caicaisy
Copy link

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄
This happened to me, too, with mAP of only 0.2%. When I changed the dataset, the accuracy was more than 40%, but why did the mAP decrease with training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants