-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate MNLI #450
Comments
Hi, I'll be working on this one! |
@PatrykNeubauer any progress? |
Hey, sorry but nothing concrete yet, as I've been mostly getting up to speed on the library and evaluation of LLMs in general. What I've found:
Two possible sources of inconsistencies I've noticed:
|
Ran the evaluation on few models, tried out both the current version of the prompt and a slightly modified one with extra To summarize:
So:
With single GPThf-causal (pretrained=gpt2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
gpt3 (engine=davinci), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
gpt3 (engine=text-davinci-003), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
OPThf-causal (pretrained=facebook/opt-125m), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-causal (pretrained=facebook/opt-350m), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-causal (pretrained=facebook/opt-1.3b), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
T5hf-seq2seq (pretrained=t5-base), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-seq2seq (pretrained=google/flan-t5-base), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
With extra GPThf-causal (pretrained=gpt2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
gpt3 (engine=text-davinci-003), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
OPThf-causal (pretrained=facebook/opt-125m), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-causal (pretrained=facebook/opt-350m), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-causal (pretrained=facebook/opt-1.3b), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
T5hf-seq2seq (pretrained=t5-base), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
hf-seq2seq (pretrained=google/flan-t5-base), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
(perhaps woth noticing that opt-125m got better than opt-350m with this format) |
This is an excellent report! I feel comfortable adopting this as our Officially Recommended Format now :) |
No description provided.
The text was updated successfully, but these errors were encountered: