-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The input format for XNLI seems wired? #1822
Comments
@SefaZeng this is intentional. It's inspired by how XNLI was evaluated in the XGLM paper. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I try to test the XNLI results using the latest commit. And I find the inputs have a prefix_token which looks like:
This is the content of
requests
for_loglikelihood_tokens
. Thecontext
is '' andcontext_enc
is [1]. It is appended in this function fromlm_eval/api/model.py
:As XNLI's config (xnli_zh.yaml) is this:
It merges both premise and hypothesis and has no
context
. So, the code will add aprefix_token_id
to the input for any model.This input format seems wired for base model as most LLM do not use
bos
oreos
in the start of inputs (maybe except Gemma).The text was updated successfully, but these errors were encountered: