You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For CoQA, in coqa/utils.py, only the last answer of each text (i.e. the answer for the last turn_id, with all the previous questions and answers in the context window) is predicted. On the website of the authors of CoQA, they seem to consider all turn_id (see their sample prediction file where they have answers for every turn_id, and the official evaluation script where they do an average of the result for every turn_id).
I haven't found how it's implemented with other popular LLM evaluation frameworks, but I'm pretty sure that predicting only the answer to the last question is not what is intended by the authors of CoQA.
The text was updated successfully, but these errors were encountered:
I think it probably makes sense to support both versions of this task, but we should make it clear that the one described in the OP is the official one.
For CoQA, in coqa/utils.py, only the last answer of each text (i.e. the answer for the last turn_id, with all the previous questions and answers in the context window) is predicted. On the website of the authors of CoQA, they seem to consider all turn_id (see their sample prediction file where they have answers for every turn_id, and the official evaluation script where they do an average of the result for every turn_id).
Here is an excerpt from the paper (https://arxiv.org/pdf/1808.07042.pdf) :
I haven't found how it's implemented with other popular LLM evaluation frameworks, but I'm pretty sure that predicting only the answer to the last question is not what is intended by the authors of CoQA.
The text was updated successfully, but these errors were encountered: