Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How is PTA calculated? #1

Closed
gordon-lim opened this issue Jun 24, 2024 · 1 comment
Closed

Question: How is PTA calculated? #1

gordon-lim opened this issue Jun 24, 2024 · 1 comment

Comments

@gordon-lim
Copy link

Thank you for the paper. I want to clarify how Poisoning Test Accuracy (PTA) is calculated in your paper. In particular, I want to ask if you accounted for the model's own mistakes independent of your attack. For example, in your main table, even with 0% corruption you reported a non-zero PTA. In those cases, did you check if without your trigger does the model still predict the target label? If so, it does not seem like it would be to the credit of the attack. I'd imagine a fair calculation of PTA would only count cases where the model predicted a true label but got flipped to the target label because of the trigger. My apologies if I missed this discussion in the paper.

@rjha18
Copy link
Collaborator

rjha18 commented Jun 25, 2024

Hey, thanks for your interest in the paper! As for calculating $\rm PTA$, recall that we define $\rm PTA$ to be the probability that applying the trigger to an image (from the clean test set) yields the target label. More formally,
$$\mathrm{PTA} := P_{(x,y)\sim S_{ct}^{'}} [f(T(x); θ) = y_{\rm target}],$$
where $S_{ct}^{'}$ is a subset of the clean test set. Notably, in our definition, there is no conditioning on the label the model would have predicted had there been no trigger.

This metric is widely used in the literature (sometimes referred to as Attack Success Rate) and accentuates the adversary's end goal of yielding the target label $y_{\rm target}$ from the attacked model. In other words, under this definition, it doesn't matter how the model produces the target label, just that it does.

I should note that due the quality of the models and the choices of source class(es) and target class(es), the proposed metric and ours end up yielding almost identical results. As you point out the PTA at $0$ poisons is sometimes nonzero, however it never exceeds $00.2$% (two-tenths of a percent). If you're interested in calculating the proposed metric, it should be as simple as subtracting the $0$ seed case from the $N$ seed case.

Let me know if you have any other questions, if not I'll close this issue.

@rjha18 rjha18 closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants