Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate on the test set #11

Closed
swagshaw opened this issue May 8, 2024 · 2 comments
Closed

Validate on the test set #11

swagshaw opened this issue May 8, 2024 · 2 comments

Comments

@swagshaw
Copy link

swagshaw commented May 8, 2024

valid_dataset = torch.utils.data.ConcatDataset([weak_val, synth_val, strong_val, test_dataset])

I am reproducing the result of the ATST-SED. When I was doing stage 2, I realized that the test set already leaked into the valid set.
Is this designed as proposed? I cannot find this part of the explanation in the paper. Did you also keep this setting of train/valid/test split into the baseline BEATs model, otherwise I doubt whether the improvement is from it.

@SaoYear
Copy link
Member

SaoYear commented May 8, 2024

Hi, thanks for noticing that.

There should be no worry on the data leakage. As you can see in the trainer file, the definition of the Validation Dataset (nn.Dataset) does not determine the data used in the validation step. There are three masks that control the data used for validation, namely mask_weak, mask_synth and mask_real.

The reason why the test_dataset appearred in the valid_dataset is that, the validation results of ATST-SED are too good and keep increasing on weak data, strong real data and strong synthetic data. Therefore, we want to make sure that such continuous increments on the performances are solid - namely, the model performances indeed increase on the unseen data in training. So we surely did the experiments to test the model after each epoch, but NOT using it for model selection (the model selection is determined by the val/obj_metric defined here https://github.com/Audio-WestlakeU/ATST-SED/blob/main/train/local/ultra_sed_trainer.py#L474 ).

There should be no doubt on the improvements because:

  1. We found that, actually, the current validation method did not pick up the best model for the development dataset.
  2. The model improvement on the PublicEval dataset is also significant.

Anyway, this line of the code is indeed suspicious, I will fix that and leave a notificaiton on the home page.

Many thanks for mentioning that!

SaoYear added a commit that referenced this issue May 8, 2024
@swagshaw
Copy link
Author

swagshaw commented May 8, 2024

I see. The obj_metric is independent of the test_dataset here because of the mask. Thank you for the quickly explanation.

@swagshaw swagshaw closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants