Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate TriviaQA #456

Closed
StellaAthena opened this issue May 1, 2023 · 7 comments · Fixed by #525
Closed

Validate TriviaQA #456

StellaAthena opened this issue May 1, 2023 · 7 comments · Fixed by #525
Labels
good first issue Good for newcomers validation For validation of task implementations.

Comments

@StellaAthena
Copy link
Member

No description provided.

@StellaAthena StellaAthena added help wanted Contributors and extra help welcome. good first issue Good for newcomers validation For validation of task implementations. labels May 1, 2023
@seopbo
Copy link
Contributor

seopbo commented May 4, 2023

def construct_requests(self, doc, ctx):
ret = []
for alias in self._remove_prefixes(doc["answer"]["aliases"]):
_, is_prediction = rf.loglikelihood(ctx, " " + alias)
ret.append(is_prediction)
return ret
def process_results(self, doc, results):
return {"acc": float(any(results))}

I think above snippets are supposed to be changed to below. to @StellaAthena

    def construct_requests(self, doc, ctx):
        ret = []
        for alias in self._remove_prefixes(doc["answer"]["aliases"]):
            is_prediction, _ = rf.loglikelihood(ctx, " " + alias)
            ret.append(is_prediction)
        return ret


    def process_results(self, doc, results):
        pred = self._remove_prefixes(doc["answer"]["aliases"])[np.argmax(results)]
        gold = doc["answer"]["value"]
        return {"acc": float(pred == gold)}

@StellaAthena StellaAthena changed the title TriviaQA Validate TriviaQA May 6, 2023
@StellaAthena
Copy link
Member Author

@seopbo Great work! Can you write up a bit about how you came to this conclusion, what the paper says, etc? Right now validating your work requires largely redoing it, so it would be good to have the relevant info collected in one place to make verification easier.

@StellaAthena StellaAthena removed the help wanted Contributors and extra help welcome. label May 6, 2023
@seopbo
Copy link
Contributor

seopbo commented May 6, 2023

@seopbo Great work! Can you write up a bit about how you came to this conclusion, what the paper says, etc? Right now validating your work requires largely redoing it, so it would be good to have the relevant info collected in one place to make verification easier.

I think that triviaqa task is also regarded as multiple choice tasks in lm-evaluation-harness. In previous codes, ret consists of list of floats which are calculated from each choices. results arg is also same as return of construct_requests function. In my case, because of {"acc": float(any(results))}, return of construct_requests always 1.

p.s.

  1. I implemented codes for calculating loglikelihood like textsynth by using text-generation-inference (https://github.com/huggingface/text-generation-inference)
  2. Below results (my 13.6b bilingual model (korean dataset: my dataset, english dataset: pile, code dataset: github)
  • 0 shots: 51.02
  • 1 shots: 55.41
  • 5: shots: 58
  • 64: shots: 59.52

to: @StellaAthena

@seopbo
Copy link
Contributor

seopbo commented May 6, 2023

In llama paper, triviaqa implementation is different to previous code and my code. I think that author of previous code implements triviaqa task asmultiple-choices tasks style. Implementation of triviaqa really generates answer in llama paper. to: @StellaAthena

image

@seopbo
Copy link
Contributor

seopbo commented May 6, 2023

If we want to implement triviaqa task as style of llama paper, https://github.com/EleutherAI/lm-evaluation-harness/blob/polyglot/lm_eval/tasks/korquad.py code will be good reference. I think that we use greedy_until func instead of loglikelihood and check if generated_text is included in doc["answer"]["aliases"]. to: @StellaAthena

    def construct_requests(self, doc, ctx):
        """Uses RequestFactory to construct Requests and returns an iterable of
        Requests which will be sent to the LM.
        :param doc:
                The document as returned from training_docs, validation_docs, or test_docs.
        :param ctx: str
                The context string, generated by fewshot_context. This includes the natural
                language description, as well as the few shot examples, and the question
                part of the document for `doc`.
        """
        continuation = rf.greedy_until(ctx, ["\n", ".", ","])
        return continuation

    def process_results(self, doc, results):
        continuation = results[0].strip().lower().translate(str.maketrans('', '', string.punctuation))
        list_of_candidates = [alias.lower().translate(str.maketrans('', '', string.punctuation)) for alias in self._remove_prefixes(doc["answer"]["aliases"])]
        return {"em": float(continuation in list_of_candidates)}
    
    def aggregation(self):
        return {
            "em": mean,
        }

    def higher_is_better(self):
        return {"em": True}

@StellaAthena
Copy link
Member Author

@seopbo Apologies for my delayed response, but if you open a PR correcting the implementation I will merge it.

Thank you!

@seopbo
Copy link
Contributor

seopbo commented May 23, 2023

@seopbo Apologies for my delayed response, but if you open a PR correcting the implementation I will merge it.

Thank you!

okay. I pr this soon. @StellaAthena

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers validation For validation of task implementations.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants