Normalize data #36

VRehnberg · 2024-03-06T17:33:02Z

This is probably the limitation that remains for good WaNet noise performance.

TODO:

Depends on Fix binary cross entropy loss #35 (so diff is not correct as is)
Check that it actually does improve training (not significantly)
Discuss transform order

Some backdoors like corner it makes a big difference if they are normalized or not. I can see two different arguments for ordering:

In scenarios we mostly care about the problem starts with a poisoned model and in those cases we simply want to train in robust backdoors.
For some backdoor detection tasks you're given a poisoned dataset and in those cases image augmentation and other transforms used for training would be applied after the backdoor has been added (though, this might not always be the case in the literature)

VRehnberg · 2024-03-07T15:28:01Z

This wasn't enough. Training is perhaps a bit faster in the beginning but no big difference at convergence.

Besides the clipping differences, there is a possible issue with the normalization of the warping field (on the order of image width in number of pixels). I'll:

visualize the backdoors again to see that they still look reasonable
see if performance improves

I'll make another PR if that turns out to be the issue.

VRehnberg · 2024-03-08T08:37:42Z

Might want to set normalization to False by default as to not complicate matters. Running some experiments to see that performance isn't hurt too bad without normalization (but with fixes from #37 ).

ejnnr · 2024-03-08T09:13:32Z

Regarding the transform order: at a very high level, I think a Task should provide the inputs the model expects, and where those come from is up to the task. If we're doing normalization, then the task should already include that as part of the dataset it returns. Detectors can do additional augmentations etc. if they want, but that's a separate question from which augmentations were used during training (or which augmentations the Task might build into the data it gives to the detector).

The more practical question of what to do for backdoors is a bit less clear. I agree that applying backdoors first and then augmentations/normalization is probably more realistic from a backdoor threat model. But I'd guess that applying the backdoor last makes things easier in practice? If so, I think that'd be fine too. Though if e.g. learning a WaNet backdoor turns out to be much harder for the model if we just do some augmentations after inserting the trigger, that's also kind of interesting from an image backdoor defense perspective---easiest defense of all time, already built into any realistic training process :P I think it's much less important for our purposes but might still be worth pointing out somewhere if true

Also added a debug message as warning. Could consider to have this be an actual warning, but as this is an active choice I figured it wouldn't need to be more verbose than a debug message.

VRehnberg · 2024-03-08T10:08:36Z

This should now mostly have no actual changes for default value right now. Can either be:

merged or
closed until if we decide we want normalization for something

VRehnberg · 2024-03-08T10:12:49Z

Regarding the transform order: at a very high level, I think a Task should provide the inputs the model expects, and where those come from is up to the task. If we're doing normalization, then the task should already include that as part of the dataset it returns. Detectors can do additional augmentations etc. if they want, but that's a separate question from which augmentations were used during training (or which augmentations the Task might build into the data it gives to the detector).

The more practical question of what to do for backdoors is a bit less clear. I agree that applying backdoors first and then augmentations/normalization is probably more realistic from a backdoor threat model. But I'd guess that applying the backdoor last makes things easier in practice? If so, I think that'd be fine too. Though if e.g. learning a WaNet backdoor turns out to be much harder for the model if we just do some augmentations after inserting the trigger, that's also kind of interesting from an image backdoor defense perspective---easiest defense of all time, already built into any realistic training process :P I think it's much less important for our purposes but might still be worth pointing out somewhere if true

I agree that it could be interesting to investigate, but I don't consider it a priority. My guess is that for the most part adding augmentations after poisoning and then train would probably lead to both harder to learn and harder to detect backdoors (similar to adding noisy warping for WaNet).

ejnnr · 2024-03-21T04:30:29Z

Looks good at a glance, planning to go over it in more detail tomorrow

ejnnr

Thanks, basically looks good (though I haven't tried training anything with normalization). Only thing I noticed are a few type annotations that look wrong. Fine by me to remove type annotations or use Any if they're too annoying to fix, just want to make sure they're not misleading. Feel free to merge once you've gone over the corresponding inline comments!

src/cupbearer/data/transforms.py

ejnnr · 2024-03-21T21:33:40Z

src/cupbearer/data/transforms.py


 import torch
 import torchvision.transforms.functional as F

+Sample = tuple[torch.Tensor] | tuple[torch.Tensor, Any]


In practice, this is correct, but it seems inconsistent with the type annotations for __rest_call__, should be tuple[torch.Tensor, Any, ...] for the second option. At that point, we might as well use tuple[torch.Tensor, ...]. Though I think arguably we should have a more consistent convention here anyway, not sure what it should be

I had tuple[torch.Tensor, *tuple[Any, ...]] before I realized that that gives SyntaxError in Python<3.11. Would be an option if we bump version requirements.

Alternatively, perhaps be restrictive to tuple[torch.Tensor, ...] instead would be best.

I made the typehint less specific than it should be for call but type checkers should be correct as long as they pick up on sample being split and sent to the separate methods form img and rest.

VRehnberg mentioned this pull request Mar 8, 2024

Fix wanet #37

Merged

VRehnberg added 6 commits March 8, 2024 11:03

Group test_data tests with classes

1fcf00c

Add normalization to PytorchDatasets

88a3600

Some type changes to reduce PyRight complaints

abdad4c

Make typing Sample compatible with Python 3.10

bef443a

Fix SyntaxError not catchable

d247e28

Make normalization not active by default

00f3d53

Also added a debug message as warning. Could consider to have this be an actual warning, but as this is an active choice I figured it wouldn't need to be more verbose than a debug message.

VRehnberg force-pushed the normalize-data branch from 30bbd3c to 00f3d53 Compare March 8, 2024 10:05

VRehnberg marked this pull request as ready for review March 8, 2024 10:13

ejnnr reviewed Mar 21, 2024

View reviewed changes

Update typehints

fcff971

ejnnr merged commit b13075e into ejnnr:main Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize data #36

Normalize data #36

VRehnberg commented Mar 6, 2024 •

edited

Loading

VRehnberg commented Mar 7, 2024 •

edited

Loading

VRehnberg commented Mar 8, 2024 •

edited

Loading

ejnnr commented Mar 8, 2024

VRehnberg commented Mar 8, 2024

VRehnberg commented Mar 8, 2024

ejnnr commented Mar 21, 2024

ejnnr left a comment

ejnnr Mar 21, 2024

VRehnberg Mar 22, 2024

VRehnberg Mar 22, 2024

Normalize data #36

Normalize data #36

Conversation

VRehnberg commented Mar 6, 2024 • edited Loading

VRehnberg commented Mar 7, 2024 • edited Loading

VRehnberg commented Mar 8, 2024 • edited Loading

ejnnr commented Mar 8, 2024

VRehnberg commented Mar 8, 2024

VRehnberg commented Mar 8, 2024

ejnnr commented Mar 21, 2024

ejnnr left a comment

Choose a reason for hiding this comment

ejnnr Mar 21, 2024

Choose a reason for hiding this comment

VRehnberg Mar 22, 2024

Choose a reason for hiding this comment

VRehnberg Mar 22, 2024

Choose a reason for hiding this comment

VRehnberg commented Mar 6, 2024 •

edited

Loading

VRehnberg commented Mar 7, 2024 •

edited

Loading

VRehnberg commented Mar 8, 2024 •

edited

Loading