Normalization for object detection #2397

pmeier · 2020-07-06T14:38:58Z

Migrated from discuss.pytorch.org. Requests were made by @mattans.

📚 Documentation

The reference implementations for classification, segmentation, and video classification all use a normalization transform. In contrast, object detection does not use any normalization.

Consider explaining why the pretrained detection models are the only ones that don’t require image normalization (I understand that the training set was not normalized. But again, why?)

Worth mentioning that no normalization is needed. The classification, segmentation and detection pretrained models are trained on ImageNet, so one may think all of them require ImageNet normalization, when in fact only the classification and segmentation models require normalization. Perhaps it’s best to put this info in a table, since the pretrained video models also have a normalization, but different.

fmassa · 2020-07-06T14:51:31Z

Hey,

So, the issue is that we embed the normalization (and other transforms) inside the model itself, see

vision/torchvision/models/detection/transform.py

Lines 104 to 105 in e212cc8

    
           image = self.normalize(image) 
        
           image, target_index = self.resize(image, target_index)

This inconsistency is unfortunate, but was kind of necessary in order to make it easier for users to use the detection models. My thinking was that we might want at some point in the future to make all the models have data transformations inside them, as the way you normalize the inputs is tied with the pre-trained weights that we provide.

For now, I think we might want to improve the documentation to potentially clarify any confusions

mattans · 2020-07-07T21:19:56Z

OK, thanks. I also think it's worth updating the docs.

pmeier · 2020-07-08T05:03:58Z

@mattans We are happy to accept a PR for that. Would you like to send one?

mattans · 2020-07-08T11:05:50Z

@mattans We are happy to accept a PR for that. Would you like to send one?

Yes, I will do it in the following days. Thank you very much.

mattans · 2020-07-09T22:24:08Z

Just to make sure: @fmassa , what will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs?
Also, does this auto-normalization apply for both training and inference?

pmeier · 2020-07-10T05:31:24Z

[W]hat will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs?

Yes. The normalization transform is "hard coded" into the models:

vision/torchvision/models/detection/faster_rcnn.py

Lines 227 to 233 in 131ba13

    
           if image_mean is None: 
        
               image_mean = [0.485, 0.456, 0.406] 
        
           if image_std is None: 
        
               image_std = [0.229, 0.224, 0.225] 
        
           transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std) 
        
           super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)

KeypointRCNN and MaskRCNN inherit from FasterRCNN (shown above) and thus also have this behavior.

Also, does this auto-normalization apply for both training and inference?

Yes. The model is created the same for training and inference

vision/references/detection/train.py

Lines 95 to 98 in 131ba13

    
           print("Creating model") 
        
           model = torchvision.models.detection.__dict__[args.model](num_classes=num_classes, 
        
                                                                     pretrained=args.pretrained) 
        
           model.to(device)

and the transform is also applied unconditionally:

vision/torchvision/models/detection/generalized_rcnn.py

Line 79 in 131ba13

images, targets = self.transform(images, targets)

fmassa added module: documentation module: models labels Jul 6, 2020

mattans mentioned this issue Jul 7, 2020

code for visualization in the object detection tutorial #1610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization for object detection #2397

Normalization for object detection #2397

pmeier commented Jul 6, 2020

fmassa commented Jul 6, 2020

mattans commented Jul 7, 2020

pmeier commented Jul 8, 2020

mattans commented Jul 8, 2020

mattans commented Jul 9, 2020

pmeier commented Jul 10, 2020

Normalization for object detection #2397

Normalization for object detection #2397

Comments

pmeier commented Jul 6, 2020

📚 Documentation

fmassa commented Jul 6, 2020

mattans commented Jul 7, 2020

pmeier commented Jul 8, 2020

mattans commented Jul 8, 2020

mattans commented Jul 9, 2020

pmeier commented Jul 10, 2020