Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Issue with Member/Non-member Data Handling in Inference Code #3

Open
ganyuhhutao opened this issue May 5, 2024 · 3 comments

Comments

@ganyuhhutao
Copy link

Hello, I've been reviewing the code and noticed a potential issue regarding the handling of member and non-member data in the inference.py script. It seems that the indices from train_indices.csv are being used as non-member data for inference, which may not align with the typical definitions used in MIA.

I hope this information is helpful, and I look forward to any clarification or updates you can provide on this matter.

Thank you for your attention to this issue.

@snoop2head
Copy link
Contributor

snoop2head commented May 5, 2024

@ganyuhhutao
Thank you for your attention to the repository!

I think it's one of my fault to write the code in confusing manner, but I think I did implemented in right way.

The target model is trained on the testset of the CIFAR dataset, where data points are saved as train_indices.csv. Testset of the CIFAR dataset is splitted as train and validation splits which the model is optimized on.

MIA/train_target.py

Lines 46 to 47 in 5dc858b

testset = DSET_CLASS(root="./data", train=False, download=True, transform=transform)
testloader = DataLoader(testset, batch_size=CFG.val_batch_size, shuffle=False, num_workers=2)

MIA/train_target.py

Lines 64 to 69 in 5dc858b

target_train_indices = np.random.choice(len(testset), CFG.target_train_size, replace=False)
target_eval_indices = np.setdiff1d(np.arange(len(testset)), target_train_indices)
# save target_train_indices as dataframe
pd.DataFrame(target_train_indices, columns=["index"]).to_csv(
"./attack/train_indices.csv", index=False
)

Subsequently, the target model inferences on member vs non-member data. The model has never seen trainset of the CIFAR dataset before, so it is non-member data. For non-member data, train_indices.csv that is saved earlier is used to index from trainset of the CIFAR dataset. The reason why indexing with train_indices.csv is just to because to match the number of datapoints between member and non-member.

MIA/inference_attack.py

Lines 53 to 64 in 5dc858b

testset = DSET_CLASS(root="./data", train=False, download=True, transform=transform)
trainset = DSET_CLASS(root="./data", train=True, download=True, transform=transform)
print("mapped classes to ids:", testset.class_to_idx)
columns_attack_sdet = [f"top_{index}_prob" for index in range(CFG.topk_num_accessible_probs)]
# load member data
list_nonmember_indices = pd.read_csv("./attack/train_indices.csv")["index"].to_list()
list_member_indices = np.random.choice(len(testset), len(list_nonmember_indices), replace=False)
subset_nonmember = Subset(trainset, list_nonmember_indices)
subset_member = Subset(testset, list_member_indices)

Sorry if this part was confusing. If I were to write code now, I would have written as following:

list_nonmember_indices = np.random.choice(len(trainset), len(pd.read_csv("./attack/train_indices.csv")["index"].to_list()) , replace=False)

But in the end, I don't think there's a problem in the alignment of the code and the paper.

@snoop2head
Copy link
Contributor

If possible, can please @dokyungs give authorization to the repo back to me so that I can clean up issues and fix the code?

@ganyuhhutao
Copy link
Author

Thank you for your explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants