New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memorization #83

Open

henrikfo wants to merge 7 commits into main from memorization

Collaborator

henrikfo commented Jun 26, 2024 •

edited by johanos1

Loading

Description

Summary of changes

Receive model logits a bit faster
Implemeted Memorization and Privacy-score ( to select most-vulnerable percentile )

Resolved Issues

fixes Implement memorization #78

How Has This Been Tested?

Run LiRA online attack

henrikfo added 5 commits

June 25, 2024 19:30


Implemented memorization

324371c


Implementing memorization

0ba448c


Implementing memorization

fa385e2


Implementing memorization

b534270


Implementing memorization

c88ff8e

henrikfo requested review from johanos1 and fazelehh

June 26, 2024 12:57

henrikfo added 2 commits

June 26, 2024 14:12


Added option for literature memorization and privacy-score thresholds

3829a90


Added option for literature memorization and privacy-score thresholds

b8a4c6e

fazelehh reviewed

View reviewed changes

config/dev_config/cifar10.yaml

Collaborator

fazelehh Jul 1, 2024

I think it is cleaner if we don't commit these changes in general.

fazelehh reviewed

View reviewed changes

leakpro/attacks/mia_attacks/abstract_mia.py

@@ @@ -3,11 +3,14 @@ @@
 from abc import ABC, abstractmethod
 import numpy as np
+import torch

Collaborator

fazelehh Jul 1, 2024

Previously we tried to avoid importing torch like this and just import modules we want. I think we still want that.

johanos1 requested changes

View reviewed changes

config/audit.yaml

+ exclude_logit_threshold: 1 # Set a threshold for when to skip a logit based (Useful for unbalance in IN vs OUT samples)
+ online: True # perform online or offline attack
+ memorization: True # Use memorization score to boost performance
+ memorization_threshold: 0.99 # The most vulnerable percentile ( 0.0 for original threshold )

Collaborator

johanos1 Jul 2, 2024

So we do not have an automatic way to do this now? Did you write an issue?

leakpro/attacks/mia_attacks/abstract_mia.py

  """
  pass
+ def _memorization(self:Self) -> None:

Collaborator

johanos1 Jul 2, 2024

I am wondering if we should have this functionality in abstractMIA or if it makes more sense to have it in utils. Any opinion?

Collaborator

johanos1 Jul 2, 2024

Basically, we could create a .py file called memorization

leakpro/attacks/mia_attacks/abstract_mia.py

+ self.logger.info("Preparing memorization")
+ if not self.online:
+ self.logger.info("Using the offline version of the attack we make some assumptions in the absence of IN-models")

Collaborator

johanos1 Jul 2, 2024

be more explicit. "Some assumption"?

Collaborator

johanos1 Jul 2, 2024

Also, we dont have anything for offline? Create issue!

leakpro/attacks/mia_attacks/abstract_mia.py

@@ @@ -53,6 +57,7 @@ def __init__( @@
  # out_members will start after the last training index and go up to the number of test indices - 1
  "out_members": np.arange(len(handler.train_indices),len(handler.train_indices)+len(handler.test_indices)),
  }
+ AbstractMIA.skip_indices = np.zeros(len(AbstractMIA.audit_dataset["data"]), dtype=bool)

Collaborator

johanos1 Jul 2, 2024

Should this really be shared among all MIAs? Right now we have used it for RMIA and LiRA but they may not use the same number of shadow models and then these indices will not be the same.

leakpro/attacks/mia_attacks/abstract_mia.py

 from leakpro.import_helper import List, Self, Union
 from leakpro.metrics.attack_result import AttackResult
 from leakpro.model import PytorchModel
+from leakpro.signals.signal import ModelLogits, ModelRescaledLogits

Collaborator

johanos1 Jul 2, 2024

Not sure we should compute signal values in abstractMIA... I know we talked about it but I think I am having second thoughts.

leakpro/attacks/mia_attacks/lira.py

@@ @@ -84,6 +91,34 @@ def description(self:Self) -> dict: @@
  "detailed": detailed_str,
  }
+ def check_logits(self:Self) -> None:

Collaborator

johanos1 Jul 2, 2024

I was going through Lira and RMIA with Fabian today and I think we should handle 0 in or out models the same way as in RMIA, i.e., by updating all the quantitites used. Here, we dont do anything about it if we have audit points lacking an in/our model...

leakpro/attacks/mia_attacks/lira.py

@@ @@ -112,36 +147,20 @@ def prepare_attack(self:Self)->None: @@
  count_in_samples = np.count_nonzero(self.in_indices_mask)
  if count_in_samples > 0:
  self.logger.info(f"Some shadow model(s) contains {count_in_samples} IN samples in total for the model(s)")
- self.logger.info("This is not an offline attack!")
+ self.logger.info("This is not an true offline attack!")

Collaborator

johanos1 Jul 2, 2024

?

leakpro/attacks/mia_attacks/lira.py

- self.target_logits = np.array(self.signal([self.target_model], self.audit_data)).squeeze()
+ self.target_logits = np.swapaxes(self.signal([self.target_model], self.audit_data), 0, 1).squeeze()
+ if self.exclude_logit_threshold > 0:

Collaborator

johanos1 Jul 2, 2024

I dont think exclude_logit_threshold should exist. We should not use audit points without shadow models. Or am I missing something?

leakpro/attacks/mia_attacks/lira.py

@@ -159,7 +178,7 @@ def run_attack(self:Self) -> CombinedMetricResult:

 score = [] # List to hold the computed probability scores for each sample

 # If fixed_variance is to be used, calculate it from all logits of shadow models

 if self.fixed_variance:

 if len(self.shadow_models) < 64:

Collaborator

johanos1 Jul 2, 2024

why 64?

leakpro/attacks/mia_attacks/lira.py

@@ -194,7 +214,7 @@ def run_attack(self:Self) -> CombinedMetricResult:

 score = np.asarray(score) # Convert the list of scores to a numpy array

 # Generate thresholds based on the range of computed scores for decision boundaries

 self.thresholds = np.linspace(np.min(score), np.max(score), 1000)

 self.thresholds = np.linspace(np.nanmin(score), np.nanmax(score), 2000)

Collaborator

johanos1 Jul 2, 2024

Dont use nanmin and nanmax! It just hides bugs. If we handle the audit points missing in or out models as in rmia, we dont need this.

fazelehh approved these changes

View reviewed changes

Collaborator

fazelehh left a comment

Nice code and great ROC curve :)
It would be beneficial to visualize the data points in the memorization and privacy score space. If you have such plots, I think the latex file is a good place to upload them.

leakpro/attacks/mia_attacks/lira.py

Collaborator

fazelehh Jul 2, 2024

in the description, maybe we should also mention the memorization paper.

leakpro/attacks/mia_attacks/lira.py

@@ -182,10 +201,11 @@ def run_attack(self:Self) -> CombinedMetricResult:

 if self.online:

 in_mean = np.mean(shadow_models_logits[mask])

 if not self.fixed_variance:

 if len(self.shadow_models) >= 64:

Collaborator

fazelehh Jul 2, 2024

Seems like a magic number.

leakpro/attacks/mia_attacks/lira.py

  self.include_train_data = configs.get("include_train_data", self.online)
  self.include_test_data = configs.get("include_test_data", self.online)
  # Define the validation dictionary as: {parameter_name: (parameter, min_value, max_value)}
  validation_dict = {
  "num_shadow_models": (self.num_shadow_models, 1, None),
+ "exclude_logit_threshold": (self.exclude_logit_threshold, 0, int(self.num_shadow_models/2)),

Collaborator

fazelehh Jul 2, 2024

Nice with the verification :)

leakpro/attacks/mia_attacks/abstract_mia.py

+ """Adjust thesholds to achieve the desired percentile most vulnerable datapoints."""
+ audit_dataset_len = len(self.target_logits)
+ if audit_dataset_len*(1-self.memorization_threshold) < 30:

Collaborator

fazelehh Jul 2, 2024

maybe using a variable like num_requested_datapoint or something like that instead of a number.

johanos1 added the enhancement label

johanos1 assigned henrikfo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment