Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization for object detection #2397

Open
pmeier opened this issue Jul 6, 2020 · 6 comments
Open

Normalization for object detection #2397

pmeier opened this issue Jul 6, 2020 · 6 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Jul 6, 2020

Migrated from discuss.pytorch.org. Requests were made by @mattans.

📚 Documentation

The reference implementations for classification, segmentation, and video classification all use a normalization transform. In contrast, object detection does not use any normalization.

  1. Consider explaining why the pretrained detection models are the only ones that don’t require image normalization (I understand that the training set was not normalized. But again, why?)
  2. Worth mentioning that no normalization is needed. The classification, segmentation and detection pretrained models are trained on ImageNet, so one may think all of them require ImageNet normalization, when in fact only the classification and segmentation models require normalization. Perhaps it’s best to put this info in a table, since the pretrained video models also have a normalization, but different.
@fmassa
Copy link
Member

fmassa commented Jul 6, 2020

Hey,

So, the issue is that we embed the normalization (and other transforms) inside the model itself, see

image = self.normalize(image)
image, target_index = self.resize(image, target_index)

This inconsistency is unfortunate, but was kind of necessary in order to make it easier for users to use the detection models. My thinking was that we might want at some point in the future to make all the models have data transformations inside them, as the way you normalize the inputs is tied with the pre-trained weights that we provide.

For now, I think we might want to improve the documentation to potentially clarify any confusions

@mattans
Copy link

mattans commented Jul 7, 2020

OK, thanks. I also think it's worth updating the docs.

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 8, 2020

@mattans We are happy to accept a PR for that. Would you like to send one?

@mattans
Copy link

mattans commented Jul 8, 2020

@mattans We are happy to accept a PR for that. Would you like to send one?

Yes, I will do it in the following days. Thank you very much.

@mattans
Copy link

mattans commented Jul 9, 2020

Just to make sure: @fmassa , what will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs?
Also, does this auto-normalization apply for both training and inference?

@pmeier
Copy link
Collaborator Author

pmeier commented Jul 10, 2020

[W]hat will happen if I use the object detection models without pretraining? Will it still auto-normalize the inputs?

Yes. The normalization transform is "hard coded" into the models:

if image_mean is None:
image_mean = [0.485, 0.456, 0.406]
if image_std is None:
image_std = [0.229, 0.224, 0.225]
transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)
super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)

KeypointRCNN and MaskRCNN inherit from FasterRCNN (shown above) and thus also have this behavior.


Also, does this auto-normalization apply for both training and inference?

Yes. The model is created the same for training and inference

print("Creating model")
model = torchvision.models.detection.__dict__[args.model](num_classes=num_classes,
pretrained=args.pretrained)
model.to(device)

and the transform is also applied unconditionally:

images, targets = self.transform(images, targets)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants