-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memorization #83
base: main
Are you sure you want to change the base?
Memorization #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is cleaner if we don't commit these changes in general.
@@ -3,11 +3,14 @@ | |||
from abc import ABC, abstractmethod | |||
|
|||
import numpy as np | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we tried to avoid importing torch like this and just import modules we want. I think we still want that.
exclude_logit_threshold: 1 # Set a threshold for when to skip a logit based (Useful for unbalance in IN vs OUT samples) | ||
online: True # perform online or offline attack | ||
memorization: True # Use memorization score to boost performance | ||
memorization_threshold: 0.99 # The most vulnerable percentile ( 0.0 for original threshold ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do not have an automatic way to do this now? Did you write an issue?
@@ -256,3 +261,117 @@ def run_attack(self:Self) -> Union[AttackResult, List[AttackResult]]: | |||
|
|||
""" | |||
pass | |||
|
|||
def _memorization(self:Self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should have this functionality in abstractMIA or if it makes more sense to have it in utils. Any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we could create a .py file called memorization
|
||
self.logger.info("Preparing memorization") | ||
if not self.online: | ||
self.logger.info("Using the offline version of the attack we make some assumptions in the absence of IN-models") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be more explicit. "Some assumption"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we dont have anything for offline? Create issue!
@@ -53,6 +57,7 @@ def __init__( | |||
# out_members will start after the last training index and go up to the number of test indices - 1 | |||
"out_members": np.arange(len(handler.train_indices),len(handler.train_indices)+len(handler.test_indices)), | |||
} | |||
AbstractMIA.skip_indices = np.zeros(len(AbstractMIA.audit_dataset["data"]), dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be shared among all MIAs? Right now we have used it for RMIA and LiRA but they may not use the same number of shadow models and then these indices will not be the same.
|
||
from leakpro.import_helper import List, Self, Union | ||
from leakpro.metrics.attack_result import AttackResult | ||
from leakpro.model import PytorchModel | ||
from leakpro.signals.signal import ModelLogits, ModelRescaledLogits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should compute signal values in abstractMIA... I know we talked about it but I think I am having second thoughts.
@@ -84,6 +91,34 @@ def description(self:Self) -> dict: | |||
"detailed": detailed_str, | |||
} | |||
|
|||
def check_logits(self:Self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going through Lira and RMIA with Fabian today and I think we should handle 0 in or out models the same way as in RMIA, i.e., by updating all the quantitites used. Here, we dont do anything about it if we have audit points lacking an in/our model...
@@ -112,36 +147,20 @@ def prepare_attack(self:Self)->None: | |||
count_in_samples = np.count_nonzero(self.in_indices_mask) | |||
if count_in_samples > 0: | |||
self.logger.info(f"Some shadow model(s) contains {count_in_samples} IN samples in total for the model(s)") | |||
self.logger.info("This is not an offline attack!") | |||
self.logger.info("This is not an true offline attack!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
self.target_logits = np.array(self.signal([self.target_model], self.audit_data)).squeeze() | ||
self.target_logits = np.swapaxes(self.signal([self.target_model], self.audit_data), 0, 1).squeeze() | ||
|
||
if self.exclude_logit_threshold > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think exclude_logit_threshold should exist. We should not use audit points without shadow models. Or am I missing something?
@@ -159,7 +178,7 @@ def run_attack(self:Self) -> CombinedMetricResult: | |||
score = [] # List to hold the computed probability scores for each sample | |||
|
|||
# If fixed_variance is to be used, calculate it from all logits of shadow models | |||
if self.fixed_variance: | |||
if len(self.shadow_models) < 64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 64?
@@ -194,7 +214,7 @@ def run_attack(self:Self) -> CombinedMetricResult: | |||
score = np.asarray(score) # Convert the list of scores to a numpy array | |||
|
|||
# Generate thresholds based on the range of computed scores for decision boundaries | |||
self.thresholds = np.linspace(np.min(score), np.max(score), 1000) | |||
self.thresholds = np.linspace(np.nanmin(score), np.nanmax(score), 2000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont use nanmin and nanmax! It just hides bugs. If we handle the audit points missing in or out models as in rmia, we dont need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice code and great ROC curve :)
It would be beneficial to visualize the data points in the memorization and privacy score space. If you have such plots, I think the latex file is a good place to upload them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the description, maybe we should also mention the memorization paper.
@@ -182,10 +201,11 @@ def run_attack(self:Self) -> CombinedMetricResult: | |||
|
|||
if self.online: | |||
in_mean = np.mean(shadow_models_logits[mask]) | |||
if not self.fixed_variance: | |||
if len(self.shadow_models) >= 64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a magic number.
|
||
self.include_train_data = configs.get("include_train_data", self.online) | ||
self.include_test_data = configs.get("include_test_data", self.online) | ||
|
||
# Define the validation dictionary as: {parameter_name: (parameter, min_value, max_value)} | ||
validation_dict = { | ||
"num_shadow_models": (self.num_shadow_models, 1, None), | ||
"exclude_logit_threshold": (self.exclude_logit_threshold, 0, int(self.num_shadow_models/2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice with the verification :)
"""Adjust thesholds to achieve the desired percentile most vulnerable datapoints.""" | ||
|
||
audit_dataset_len = len(self.target_logits) | ||
if audit_dataset_len*(1-self.memorization_threshold) < 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe using a variable like num_requested_datapoint or something like that instead of a number.
Description
Summary of changes
Resolved Issues
How Has This Been Tested?