Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo #1

Closed
trungpham2606 opened this issue Nov 14, 2021 · 66 comments
Closed

demo #1

trungpham2606 opened this issue Nov 14, 2021 · 66 comments
Labels
bug Something isn't working datasets Datasets and label generation good first issue Good for newcomers

Comments

@trungpham2606
Copy link

Thank you @Shank2358 for sharing a great work.
Iam trying to visualize the detection with your work.
But when I normally ran the test.py file, it popped up an error about e2cnn. So can you show me the reference of e2cnn.
Thank in advance.

@Shank2358
Copy link
Owner

Thank you @Shank2358 for sharing a great work. Iam trying to visualize the detection with your work. But when I normally ran the test.py file, it popped up an error about e2cnn. So can you show me the reference of e2cnn. Thank in advance.

Thanks. You only need to comment out all the parts of E2CNN. I updated GGHL.py, you can try it with the new version.

In addition, the E2CNN library (https://github.com/csuhan/e2cnn) is used by ReDet's backbone. Because I want to try other backbones for recently, I added it.
If there are other problems, I am happy to answer them.

@trungpham2606
Copy link
Author

Thank you @Shank2358 for your quick response.
According to your paper, you also tested the model on SKU dataset, the results were extremely great. Can you provide the pretrained weights for the SKU dataset as well ?

@Shank2358
Copy link
Owner

for the SKU dataset

Of course. The weights for the SKU datase has already been available. You can download it from Baidu Disk(password: c3jv) or Google Drive.

The weights for the SSDD+ dataset will also be available soon.

Thank you.

@trungpham2606
Copy link
Author

@Shank2358
Thank to your weights, I was able to visualize the detection on an image from DOTA and SKU.
image
image

I really want to train my own dataset. The VOC format is not a problem but can you tell me the way you define the rotating angle ?

@Shank2358
Copy link
Owner

Shank2358 commented Nov 15, 2021

@Shank2358 Thank to your weights, I was able to visualize the detection on an image from DOTA and SKU. image image

I really want to train my own dataset. The VOC format is not a problem but can you tell me the way you define the rotating angle ?

Congratulations!

For oriented bounding boxes, maybe you can use cv2.polylines(img, [points]) to draw them. The results in our paper are drawn like this. (I see that the result of your visualization is horizontal bounding boxes, maybe you can try this)

The format of the training dataset is like this:
image_path xmin,ymin,xmax,ymax,class_id,x1,y1,x2,y2,x3,y3,x4,y4,area_ratio,angle[0,-90)

We use the opencv definition of the angle, that is, the range of the angle is [0,-90), and a more specific explanation is as follows
image

You can use the cv2.minAreaRect(Points) function to calculate the angle.
For more specific calculation methods and explanations, you can refer to the official documentation of opencv。
https://docs.opencv.org/3.4/de/d62/tutorial_bounding_rotated_ellipses.html

[2021-11-15-16:23] I have updated the description of the data set format in readme.md.

@Shank2358 Shank2358 added bug Something isn't working datasets Datasets and label generation good first issue Good for newcomers labels Nov 16, 2021
@trungpham2606
Copy link
Author

Hello @Shank2358
How can I calculate the area_ratio ?

@Shank2358
Copy link
Owner

Hello @Shank2358 How can I calculate the area_ratio ?

Hi. I have added a script for generating datasets in ./datasets_tools/DOTA2Train.py, maybe you can try it.
The Line 72-72 of DOTA2Train.py is used for calculating the area_ratio, which is the ratio of OBB's area and HBB's area
Thank you.

@trungpham2606
Copy link
Author

@Shank2358 I see, I will try it.

@trungpham2606
Copy link
Author

@Shank2358
I see in the script, the angle is calculated directly from cv2, but you dont change its value as stated on the paper ?
I mean: if angle in [pi/2, pi] -> angle = angle - pi/2 ?

@Shank2358
Copy link
Owner

@Shank2358 I see in the script, the angle is calculated directly from cv2, but you dont change its value as stated on the paper ? I mean: if angle in [pi/2, pi] -> angle = angle - pi/2 ?

This angle transformation in paper has been done by the opencv function. so the output is (-pi/2,0] directly.

@trungpham2606
Copy link
Author

@Shank2358
After converting my custom dataset to the format that GGHL needs, I can train but met this issue:
image

@Shank2358
Copy link
Owner

@Shank2358 After converting my custom dataset to the format that GGHL needs, I can train but met this issue: image

Hi. I guess the following potential problems may lead to Nan.

  1. You'd better to check whether the converted data is correct. It would be a good idea to visualize them. The correct results should be like this in the paper.
    image

  2. Model initialization parameters need to be reset. Maybe you can try our pre training weight (trained on ImageNet), which will make the convergence more stable. The link of pre-trained weight is as follows.
    Baidu_Disk(password:0blv)
    Google_Drive

  3. If the pre-trained weight is not used, the parameter initialization method may need to be adjusted. Our default initialization is the initialization with the mean value of 0 and the variance of 0.01. Maybe you can try Xavier initialization or Kaiming initialization.

  4. Maybe you can check whether the denominator or log is 0, which will also lead to NaN

  5. Training hyper-parameters such as learning rate may also need to be readjusted if you train your own dataset.

As the only information available to me is this screenshot, I can only infer from experience that Nan may be caused by the above reasons. I guess it is a higher probability of data conversion and pre-trained weight. Try them first. If you still have problems, please leave me a message or e-mail. I will try my best to help you solve this problem.

Thank you.

@trungpham2606
Copy link
Author

trungpham2606 commented Nov 17, 2021

@Shank2358
1.How can I draw the heatmap image as yours in paper ?
2.I download the Imagenet pretrained weights, and now it can train (without Nan loss).
3. I will try other approaches if the Nan loss still exists.
4. Thank you!

@Shank2358
Copy link
Owner

Shank2358 commented Nov 17, 2021

@Shank2358 1.How can I draw the heatmap image as yours in paper ? 2.I download the Imagenet pretrained weights, and now it can train (without Nan loss). 3. I will try other approaches if the Nan loss still exists. 4. Thank you!

  1. Just use Matplotlib to display (label_sbbox, label_mbbox, label_lbbox). For example, add the following codes after Line 34 in datasets_obb.py
        import matplotlib.pyplot as plt
        img = np.uint8(np.transpose(img, (1, 2, 0)) * 255)
        plt.figure("img") 
        plt.imshow(img)

        mask_s = np.max(label_sbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_s")
        plt.imshow(mask_s, cmap='jet')

        mask_m = np.max(label_mbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_m")
        plt.imshow(mask_m, cmap='jet')

        mask_l = np.max(label_lbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_l") 
        plt.imshow(mask_l, cmap='jet')

        plt.show()

By the way, datasets_ obb.py can be run independently (I wrote the main function). You can run it when checking data and visualization.

  1. Congratulations! 🎉🎉🎉
  2. Please train a few more epoches to see if Nan will appear again.
  3. You are welcome.

@trungpham2606
Copy link
Author

@Shank2358
Do you think it's normal with my heatmaps:
image
pretty weird that the mask_s is like empty.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 17, 2021

@Shank2358 Do you think it's normal with my heatmaps: image pretty weird that the mask_s is like empty.

It seems OK.
When there are no small objects, mask_s is indeed empty. The paper explains that objects of different sizes need to be assigned to different layers. When one layer is not assigned to the corresponding object, it is empty.
image
The scale hyper-parameter tau can be adjusted according to your dataset.

@trungpham2606
Copy link
Author

@Shank2358
So everything seems correct now. I will train on my custom dataset and comeback with the results.
From the current training progress (after 30 epochs) the classification loss is not stable, it's pretty large (sometimes > 100). I will train more to see if the issue still exists.
Thank you so much !

@Shank2358
Copy link
Owner

Shank2358 commented Nov 17, 2021

@Shank2358 So everything seems correct now. I will train on my custom dataset and comeback with the results. From the current training progress (after 30 epochs) the classification loss is not stable, it's pretty large (sometimes > 100). I will train more to see if the issue still exists. Thank you so much !

image

The loss_cls is not add to the total loss, we use the loss_pos and loss_neg instead. Maybe you can try to add this for stable training in the early stage. I have modified the loss_jol.py, please update it.

@trungpham2606
Copy link
Author

trungpham2606 commented Nov 18, 2021

@Shank2358
I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

@Shank2358
Copy link
Owner

Shank2358 commented Nov 19, 2021

@Shank2358 I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

  1. Have you checked the order of vertices? The order of p1-p2-p3-p4 is as in the paper.
    I think this problem is most likely. 😥😥 Recently, I am rewriting the code of label conversion. The vertices in the labels used here are sorted. Maybe your order is inconsistent with our definition. I will rewrite a code that can sort automatically and update it in two days. 🤖
  2. Is loss updated? I updated it yesterday.
  3. Can the assigned Gaussian heatmap correspond to the original image?
  4. Do all losses not converge or do a part of losses not converge?
  5. I will run this code again with other datasets, and then give you feedback.
  6. We set it to 70 epochs before calculating the mAP, so the mAP will be displayed as 0 before that. You can modify train.py to set it.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 19, 2021

@Shank2358 I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

I cloned this code and trained GGHL for 30 epochs on the HRSC2016 dataset. The model can converge. I have uploaded the training log to the log folder.
image

The following is a visual Gaussian heatmap.
image

I also tested the mAP and visual detection results and showed that everything is normal. Although the mAP is not very high because the training rounds are not complete,it shows that the code can work. I will continue to complete the training and update the final results. It may take some time.
image

Therefore, I think there seems to be no problem with the code. It is more likely to be the problem of label conversion (see the previous reply for details). I will continue to check the code and help you solve this problem.

Thank you.

@trungpham2606
Copy link
Author

@Shank2358
I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner.
If you want to have a look, please give me your mail, I will send it to you.
Thank you!

@Shank2358
Copy link
Owner

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Of course. It's my pleasure. My email is [email protected]

@Shank2358
Copy link
Owner

Shank2358 commented Nov 19, 2021

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

This is the complete test results of HRSC2016 dataset
image

@trungpham2606
Copy link
Author

@Shank2358
can you show some of the detections as well ^^.
btw, I sent to you my dataset.

@Shank2358
Copy link
Owner

@Shank2358 can you show some of the detections as well ^^. btw, I sent to you my dataset.

This is the result when confidence threshold = 0.3
image
image

@trungpham2606
Copy link
Author

trungpham2606 commented Nov 19, 2021

@Shank2358 look great :O !!!!!

@Shank2358
Copy link
Owner

@Shank2358 Can I see your loss during training my dataset ?

Sorry the training log was deleted. This is some data from the screenshot at that time.
Epoch:[ 21/51] Batch:[ 0/1] Img_size:[800] Loss:58.6973 Loss_fg:4.1182 | Loss_bg:3.2572 | Loss_pos:20.9150 | Loss_neg:4.6449 | Loss_iou:3.8027 | Loss_cls:14.4501 | Loss_s:1.7436 | Loss_r:3.1436 | Loss_l:2.6220 | LR:0.000134864
[2021-11-20 02:07:32,153]-[train_GGHL.py line:203]: Epoch:[ 46/51] Batch:[ 0/1] Img_size:[800] Loss:44.5443 Loss_fg:3.3079 | Loss_bg:1.6776 | Loss_pos:15.9689 | Loss_neg:3.5604 | Loss_iou:3.8235 | Loss_cls:9.9970 | Loss_s:1.8301 | Loss_r:2.0814 | Loss_l:2.2976 | LR:6.06895e-06
I think your dataset is too small to support the parameter update of such a large model. Maybe you can replace the backbone with a lighter one, such as resnet18.

  1. Thank you for the info.
  2. Iam training a new one, more challenging than the toy set I sent to you.

Cool!Thanks for sharing.

@Fly-dream12
Copy link

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Of course. It's my pleasure. My email is [email protected]

I cannot concat you by this email. Can you provide another?

[email protected] Please try it. Thank you.

can you change another email address? Thanks

@trungpham2606
Copy link
Author

@Shank2358
After training so many epochs, during testing it cant detect anything. maybe the approach is not suitable with this kind of dataset.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 After training so many epochs, during testing it cant detect anything. maybe the approach is not suitable with this kind of dataset.

How many training samples do you have?

@Shank2358 I have 185 images.

Can you show me the loss, please?
I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason?
Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

@trungpham2606
Copy link
Author

@Shank2358
I have 185 images.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

Can you show me the loss, please?
I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason?
Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

Can you show me the loss, please?
I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason?
Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

This is the result visualization code, I commented it out? Did you uncomment it?
image

@trungpham2606
Copy link
Author

@Shank2358
Everything was checked.
If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing.
I didnt uncomment it.
here is the loss
[2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 Everything was checked. If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing. I didnt uncomment it. here is the loss [2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

It seems to be a classification problem, cls_loss is very high, because the scores = confidence * class_scores.
The bbox regression of the model seems to have converged.
Try to remove the classification score in the evaluator and only take the confidence score, can you draw something.
image
scores = pred_conf and see if there is any result.

If there are results in this way, then it is a classification problem. Then continue to troubleshoot the bug. Maybe there is a problem with the loss of the category, or maybe the id of the category is not correct?

There are two bugs in the loss function. I changed it the day before yesterday. Have you updated it?
image
image

@Shank2358
Copy link
Owner

@Shank2358 Everything was checked. If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing. I didnt uncomment it. here is the loss [2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

In addition, 500 epochs seem to be too much, and the model may be over-fitting.

@trungpham2606
Copy link
Author

trungpham2606 commented Nov 21, 2021

@Shank2358
Actually It was trained 243/500, not 500 (I stopped it).
Iam ignoring the classes as you suggested and It output some results. I will draw the rotate boxes to see the results clearly.

@trungpham2606
Copy link
Author

Oh i didnt update it :((. I will retrain to see.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 Actually It was trained 243/500, not 500 (I stopped it). Iam ignoring the classes as you suggested and It output some results. I will draw the rotate boxes to see the results clearly.

Oh i didnt update it :((. I will retrain to see.

I'm sorry, this bug appeared in the version a few days ago, I forgot to remind you.
It seems that there is a problem with classification. I have a few suggestions, maybe you can try them.

  1. The ID of the category starts from 0, is it correct?
  2. The classified loss is mainly composed of three parts. pos_loss represents the loss of the positive category, and negative_loss represents the loss of the negative category. Your neg_loss=0, I guess there is only one category? (If I guess wrong, maybe you need to check 1) carefully).
    If there are multiple categories, neg_loss will not always be 0.
    cls_loss represents the classic BCEloss (this is not used in our paper), adding it may converge more stable.
    Or you can use BCEloss instead of pos_loss and neg_loss. This is the most common classification loss.
    Good luck.

There are some updates to the details of the code. You'd better update the three files of dataloader, agmentation, and loss. thanks.

@trungpham2606
Copy link
Author

@Shank2358

  1. it starts from 0.
  2. this dataset has 1 class only.
  3. and when i tried to get the bboxes in evaluator, the shape of coors_rota seems not correct
    image

it should be (8, *) right ?

@trungpham2606
Copy link
Author

oh i will pull the latest code and re-train.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

3. ta seems not c

1 and 2 ar OK
3. coors=[xmin,ymin,xmax,ymax], coors_rota=[s1,s2,s3,s4], s1-s4 are in (0,1).

If you want to get the bbox, it may be more appropriate at this position, they are the decoded boxes

image

Here is the result of decoding bbox=[x1,y1,x2,y2,x3,y3,x4,y4] according to coors=[xmin,ymin,xmax,ymax] and coors_rota=[s1,s2,s3,s4]

@Shank2358
Copy link
Owner

oh i will pull the latest code and re-train.

If there is no problem with the positioning and the results can be displayed, you can try to retrain and see. If there is still a problem with bbox positioning, you may need to continue to troubleshoot the bug first.

@trungpham2606
Copy link
Author

@Shank2358
Got it.
Thank you!!!

@trungpham2606
Copy link
Author

@Shank2358
The loss_cls is still so high and doesnt decrease at all.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 The loss_cls is still so high and doesnt decrease at all.

Do loss_cls and loss_pos not decreased? Or is there only one loss that does not decrease?
Have you tried loss_pos+loss_neg+loss_cls
loss_pos + loss_neg
loss_cls
Do these loss combinations do not work?
It seems that this should not be the case for single classification tasks, because in fact, only confdence is required to detect the target.
Can you show me the loss? Or can you please check if the loss is consistent with the screenshot above. Or I will take a few similar pictures tomorrow and try it out for training?

@Shank2358
Copy link
Owner

@Shank2358 The loss_cls is still so high and doesnt decrease at all.

Did you run out of the result of the toy dataset yesterday? If you can get results similar to mine, it means that our code is consistent.

@trungpham2606
Copy link
Author

@Shank2358
The loss_cls doesnt decrease while the other loss (loss_l, loss_m, loss_s) decrease but not small as what I had shown to you.
I think the best way is I send you the data to train to see whether you meet the same issue as mine.

@Shank2358
Copy link
Owner

@Shank2358 The loss_cls doesnt decrease while the other loss (loss_l, loss_m, loss_s) decrease but not small as what I had shown to you. I think the best way is I send you the data to train to see whether you meet the same issue as mine.

Thanks. I will try to train and give you feedback as soon as possible.

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 The loss_cls doesnt decrease while the other loss (loss_l, loss_m, loss_s) decrease but not small as what I had shown to you. I think the best way is I send you the data to train to see whether you meet the same issue as mine.

The loss when training 12 epochs.
image

It seems to converge. I will give you feedback after all training in a while. Then I will email you all the codes, weights and logs of my training. It seems that there are still some problems in your code. In addition, I will keep your data confidential.

@trungpham2606
Copy link
Author

@Shank2358 I see, thank you !

@Shank2358
Copy link
Owner

Shank2358 commented Nov 21, 2021

@Shank2358 I see, thank you !

I have trained it for 185 epochs. This is part of the result of the visualization, it seems that the object can be detected.
I will send the complete code, weights and training log to your email tomorrow.

image

However, it is indeed difficult to detect such long objects, especially when returning to s1-s4. I increased the loss of this part by 3 times. Like most rotating object detection methods, GGHL does not perform as well as expected in the detection of such extremely long objects. For such objects, maybe you can try the PIOU algorithm. At least the experiment at this stage proves that GGHL can work.

For this type of object, I adjusted some hyperparameters. You can see the details in the code I sent you. In the label assignment process, these objects are relatively large, so they are all assigned to the mask_l layer. Therefore, it is not necessary to use a 3-layer FPN structure if there are only such objects, only one layer is OK. Maybe you can change the model structure.

Regarding the improvement and adjustment of the model, we can continue to discuss if you are interested.

Next, maybe you need to see where there are errors in your code based on the code I sent you, and then adjust the parameters and try again. The dataset is relatively small, I also suggest that you can try a smaller model, such as resnet18. Then there are many excellent rotating object detection methods that are also worth trying. Adjust and select the appropriate model according to your task needs.

Thanks.

@trungpham2606
Copy link
Author

@Shank2358
Thank you so much !!!
PIou is a nice approach for coping with this long objects. I will try applying it after getting the same results as yours ^^

@Shank2358
Copy link
Owner

@Shank2358 Thank you so much !!! PIou is a nice approach for coping with this long objects. I will try applying it after getting the same results as yours ^^

You're welcome~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets Datasets and label generation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants