-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate TriviaQA #456
Comments
lm-evaluation-harness/lm_eval/tasks/triviaqa.py Lines 76 to 84 in 3c210d4
I think above snippets are supposed to be changed to below. to @StellaAthena def construct_requests(self, doc, ctx):
ret = []
for alias in self._remove_prefixes(doc["answer"]["aliases"]):
is_prediction, _ = rf.loglikelihood(ctx, " " + alias)
ret.append(is_prediction)
return ret
def process_results(self, doc, results):
pred = self._remove_prefixes(doc["answer"]["aliases"])[np.argmax(results)]
gold = doc["answer"]["value"]
return {"acc": float(pred == gold)} |
@seopbo Great work! Can you write up a bit about how you came to this conclusion, what the paper says, etc? Right now validating your work requires largely redoing it, so it would be good to have the relevant info collected in one place to make verification easier. |
I think that p.s.
to: @StellaAthena |
In |
If we want to implement def construct_requests(self, doc, ctx):
"""Uses RequestFactory to construct Requests and returns an iterable of
Requests which will be sent to the LM.
:param doc:
The document as returned from training_docs, validation_docs, or test_docs.
:param ctx: str
The context string, generated by fewshot_context. This includes the natural
language description, as well as the few shot examples, and the question
part of the document for `doc`.
"""
continuation = rf.greedy_until(ctx, ["\n", ".", ","])
return continuation
def process_results(self, doc, results):
continuation = results[0].strip().lower().translate(str.maketrans('', '', string.punctuation))
list_of_candidates = [alias.lower().translate(str.maketrans('', '', string.punctuation)) for alias in self._remove_prefixes(doc["answer"]["aliases"])]
return {"em": float(continuation in list_of_candidates)}
def aggregation(self):
return {
"em": mean,
}
def higher_is_better(self):
return {"em": True} |
@seopbo Apologies for my delayed response, but if you open a PR correcting the implementation I will merge it. Thank you! |
okay. I pr this soon. @StellaAthena |
No description provided.
The text was updated successfully, but these errors were encountered: