Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script to sample/create image pairs from original data #8

Open
suchanv opened this issue Jan 13, 2020 · 5 comments
Open

script to sample/create image pairs from original data #8

suchanv opened this issue Jan 13, 2020 · 5 comments
Assignees

Comments

@suchanv
Copy link
Collaborator

suchanv commented Jan 13, 2020

No description provided.

@visionjo
Copy link
Owner

Why is this needed?

We provide a list of pairs.

Do you mean script that starts from the list and builds data table?

@visionjo
Copy link
Owner

Let me rephrase that— as this would be good code to add, but in terms of priority / reproducibility. Pairs (lists) do not change. This is a part of the dataset (per benchmark of current version of DB)

@suchanv
Copy link
Collaborator Author

suchanv commented Jan 13, 2020

I think it will be good if you release the full dataset as opposed to the sampled dataset that corresponds to the sampled pairs. It will also be useful for us if we change the dataset. Just my opinion. If you don't have that script, the list of sampled pairs works too. I'm not sure if I'm going to need that part for other datasets, and if I do, it would be great if you share the script or the detailed steps.

@suchanv
Copy link
Collaborator Author

suchanv commented Jan 13, 2020

But feel free to close this issue or assign low priority.

@visionjo
Copy link
Owner

visionjo commented Jan 17, 2020

I do not understand-- I do have the scripts (actually in a different project/ code for a different dataset we built). Certainly will have to modify to work here, but minimal work. But wouldn't this be more of a utility tool for someone else to build a different dataset? (which would be cool to share, as I hugely appreciate when such things are provided)

I guess the misunderstanding comes in in regards to the samples and pair-lists not being the complete dataset? Sure we sample faces, but that is for the sake of having a controlled experiment with a reasonable number of faces per subject (i.e., 25 faces for all 100 subjects from each of the 8 subgroups, split evenly). We had enough faces to probably do as much as 150 faces per subject (whatever the min number of samples for all subjects), but that is a bit loaded (like so much data and possible pairs for 800 subjects... we already have a decent size set, and with potential to add more later provided extended work/ methods/ task protocol)

Thus, we provide a benchmark for others to run against (i.e., try to improve the performance of the same experiment). If everyone generates their own list then how would we be able to fairly compare and claim SOTA (it would be a frenzy, and certainly open to people subsetting easier test, amongst other issues). Unless a dataset consists of raw data samples, ground-truth labels (at least for the test, as unsupervised this holds too), and lists it is not a complete benchmark dataset. Make sense?

But perhaps there is something I am missing/ misunderstanding. Furthermore, you could have something that could make for a better resource (i.e., if you are thinking of something that you could use, then certainly others probably could too).

Thank you for sharing, and not closing the issue until I better understand. Or until you close via better understanding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants