Is it possible to use BLEU with multiple references? #1125

juliafalcao · 2023-12-14T10:57:29Z

I'm creating a new task and I would like to evaluate my generated output against N different references with BLEU, but the code appears to only pick up the first available reference, and I'm not sure how to map the doc_to_target in the task YAML to include multiple refs.

The text was updated successfully, but these errors were encountered:

lintangsutawika · 2023-12-14T13:05:40Z

I'm assuming if you want to use the BLEU metric, then you would want to use the generate_until task type. In that case, you could also use the HF's implementation of BLEU.

For doc_to_target we support it having more than 1 answer so you could make it that the dataset used has a gold feature that stores a list of references for each sample.

haileyschoelkopf · 2023-12-14T15:18:55Z

TriviaQA is one example of a dataset that uses multiple references! https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/triviaqa/default.yaml Please let us know if you have trouble mapping this onto BLEU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use BLEU with multiple references? #1125

Is it possible to use BLEU with multiple references? #1125

juliafalcao commented Dec 14, 2023

lintangsutawika commented Dec 14, 2023

haileyschoelkopf commented Dec 14, 2023

Is it possible to use BLEU with multiple references? #1125

Is it possible to use BLEU with multiple references? #1125

Comments

juliafalcao commented Dec 14, 2023

lintangsutawika commented Dec 14, 2023

haileyschoelkopf commented Dec 14, 2023