-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
script to sample/create image pairs from original data #8
Comments
Why is this needed? We provide a list of pairs. Do you mean script that starts from the list and builds data table? |
Let me rephrase that— as this would be good code to add, but in terms of priority / reproducibility. Pairs (lists) do not change. This is a part of the dataset (per benchmark of current version of DB) |
I think it will be good if you release the full dataset as opposed to the sampled dataset that corresponds to the sampled pairs. It will also be useful for us if we change the dataset. Just my opinion. If you don't have that script, the list of sampled pairs works too. I'm not sure if I'm going to need that part for other datasets, and if I do, it would be great if you share the script or the detailed steps. |
But feel free to close this issue or assign low priority. |
I do not understand-- I do have the scripts (actually in a different project/ code for a different dataset we built). Certainly will have to modify to work here, but minimal work. But wouldn't this be more of a utility tool for someone else to build a different dataset? (which would be cool to share, as I hugely appreciate when such things are provided) I guess the misunderstanding comes in in regards to the samples and pair-lists not being the complete dataset? Sure we sample faces, but that is for the sake of having a controlled experiment with a reasonable number of faces per subject (i.e., 25 faces for all 100 subjects from each of the 8 subgroups, split evenly). We had enough faces to probably do as much as 150 faces per subject (whatever the min number of samples for all subjects), but that is a bit loaded (like so much data and possible pairs for 800 subjects... we already have a decent size set, and with potential to add more later provided extended work/ methods/ task protocol) Thus, we provide a benchmark for others to run against (i.e., try to improve the performance of the same experiment). If everyone generates their own list then how would we be able to fairly compare and claim SOTA (it would be a frenzy, and certainly open to people subsetting easier test, amongst other issues). Unless a dataset consists of raw data samples, ground-truth labels (at least for the test, as unsupervised this holds too), and lists it is not a complete benchmark dataset. Make sense? But perhaps there is something I am missing/ misunderstanding. Furthermore, you could have something that could make for a better resource (i.e., if you are thinking of something that you could use, then certainly others probably could too). Thank you for sharing, and not closing the issue until I better understand. Or until you close via better understanding |
No description provided.
The text was updated successfully, but these errors were encountered: