Support for top-k metrics with num_return_sequences>1 #1117

yuyemin · 2023-12-13T16:44:19Z

Hi, I wonder if lm-eval supports top-k evaluation when generating with beam_search with num_return_sequences>1 ?

haileyschoelkopf · 2023-12-13T17:02:18Z

Hi! We don't currently support num_return_sequences > 1, and mostly focus on support for Top-P / Top-K temperature sampling, though we do support running generation K times per document to get K different completions via repeats: K in a task YAML.

Could you explain exactly what "top-K evaluation" means in this context?

If you'd be interested in opening a PR to help support this or streamline evaluation in the num_return_sequences > 1 / beam search setting we'd be happy to accept it!

yuyemin · 2023-12-13T17:40:42Z

Thanks for the quick reply! I'm aiming to evaluate the top-k exact-match metric (check if there's a ground-truth hit in top-k lowest-perplexity candidates generated using beam-search + num_return_sequences=k).

In that case, I'll see if I can work on incorporating this feature and open a PR afterwards.

Hi! We don't currently support num_return_sequences > 1, and mostly focus on support for Top-P / Top-K temperature sampling, though we do support running generation K times per document to get K different completions via repeats: K in a task YAML.

Could you explain exactly what "top-K evaluation" means in this context?

If you'd be interested in opening a PR to help support this or streamline evaluation in the num_return_sequences > 1 / beam search setting we'd be happy to accept it!

ChengJade · 2024-02-18T15:23:25Z

Thanks for the quick reply! I'm aiming to evaluate the top-k exact-match metric (check if there's a ground-truth hit in top-k lowest-perplexity candidates generated using beam-search + num_return_sequences=k).

In that case, I'll see if I can work on incorporating this feature and open a PR afterwards.

Hi! We don't currently support num_return_sequences > 1, and mostly focus on support for Top-P / Top-K temperature sampling, though we do support running generation K times per document to get K different completions via repeats: K in a task YAML.
Could you explain exactly what "top-K evaluation" means in this context?
If you'd be interested in opening a PR to help support this or streamline evaluation in the num_return_sequences > 1 / beam search setting we'd be happy to accept it!

Hi, sorry to disturb you, just wondering have you worked out this?

StellaAthena assigned yuyemin Dec 18, 2023

StellaAthena added the feature request A feature that isn't implemented yet. label Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for top-k metrics with num_return_sequences>1 #1117

Support for top-k metrics with num_return_sequences>1 #1117

yuyemin commented Dec 13, 2023

haileyschoelkopf commented Dec 13, 2023

yuyemin commented Dec 13, 2023

ChengJade commented Feb 18, 2024

Support for top-k metrics with num_return_sequences>1 #1117

Support for top-k metrics with num_return_sequences>1 #1117

Comments

yuyemin commented Dec 13, 2023

haileyschoelkopf commented Dec 13, 2023

yuyemin commented Dec 13, 2023

ChengJade commented Feb 18, 2024