Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only a single filtered_resps is logged for repeat > 1 for each sample #1232

Open
baberabb opened this issue Jan 1, 2024 · 4 comments
Open
Labels
bug Something isn't working.

Comments

@baberabb
Copy link
Contributor

baberabb commented Jan 1, 2024

For repeat > 1, model outputs (resps) for each model call are saved to file when using --log_samples but only a single filtered_resp (the first one?).

@baberabb baberabb changed the title Only a single filtered_resps is logged for repeat > 1 Only a single filtered_resps is logged for repeat > 1 for each sample Jan 1, 2024
@haileyschoelkopf
Copy link
Contributor

Is this independent of the take-first filter being the default setting?

@baberabb
Copy link
Contributor Author

baberabb commented Jan 4, 2024

This was without setting an explicit filter in the yaml while testing predict_only. Also doesn't just picking the first generation defeat the whole point of repeat? Why is it even a thing. I'm probably missing something obvious.

@haileyschoelkopf
Copy link
Contributor

repeat is a thing for the purpose of things like Maj@K / self-consistency--if someone wants to use this, they'll set repeats: K and explicitly set some other filter, such that filtered_resps will end up with one response, namely the most-frequently outputted answer.

The idea is that resps contains all K outputs from the model, and the filter pipelines distill it into a single output that can be scored. (or in multiple_choice's case, a single set of N_CHOICES outputs.) This could plausibly be improved or made more easy to understand in the output files for sure and maybe we do want a different default behavior for repeats > 1.

@baberabb
Copy link
Contributor Author

baberabb commented Jan 5, 2024

Aah that makes more sense. Wouldn't pass@k be more appropriate for the default? IMO that's a bit more intuitive ( + warning if just greedy sampling).

@StellaAthena StellaAthena added the bug Something isn't working. label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

3 participants