-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The tokenizer add_special_tokens parameter for t5 model lambada task #1017
Comments
This seems pretty reasonable to me. Do you get expected results with the flag set false? |
Hi @StellaAthena, the ppl and accuracy of google/flan-t5-xl on lambada I got
and with add_special_tokens=True is
However, I cannot find the expected lambada accuracy and ppl from model card https://huggingface.co/google/flan-t5-xl and paper https://arxiv.org/pdf/2210.11416.pdf. Since lambada is a part of the finetune dataset seen from model card, 29.8% accuracy is still very low. If you have the SOTA of google/flan-t5-xl on lambada please share with me. Thanks a lot! |
I don't have any information on this. As far as I am aware this is the correct value. If you want to study further, you can examine the per-example generations and see if you see anything weird. |
Thanks @StellaAthena, do you mean to call generate and check the output? I will have a try. I also sent an email to t5 author, hope we can get a feedback. |
Yes, you can see how to do this in the eval harness here |
Checked with google/flan-t5-xl author, the recommended way to run lambada on this model is to append EOS at the end of input and targets in _model_call(), while when we compute word accuracy and word perplexity based on outputs we can just compare the last word only and ignore the EOS. |
I meet some problems when using metric 'exact match' on t5-XXL or t5-Xl as well (BBH evaluation, got low-quality generation but have no idea so far) |
@daisyden Hi, may I know how did you solve the issue? I'm trying to run lambada evaluation on t5-base and got the same issue. The perplexity is extremely high and accuracy is almost 0. |
Me too. mt5-xl has very high word-level perplexity. |
@lintangsutawika is traveling this week but might have thoughts on this when he's back as he's worked a lot with T5 and T5-like models! Separately to possible issues re: ppl computation or special tokens, though, are the T5 models you are evaluating on trained to perform L-to-R language modeling @wangyanbao666 @djstrong ? I'd expect a T5 model trained only on span denoising / MLM to perform quite poorly on language modeling tasks like lambada or wikitext/perplexities. |
They are MLM models so maybe it doesn't make sense or it needs to add MASK token add the end of each sub sequence. I will remove these models from calculating perplexity. |
When we run lambada_openai on google/flan-t5-xl, both input token and labels are end with EOS because by default add_special_tokens=True for Seq2Seq model, however the output of the model_call does not have EOS and the accuracy is always 0. As lambada dataset input is not a full sentence, can we set add_special_tokens=False to run lambada for t5 models? Or please help to suggest how to get correct result on lambada task for t5 models.
If you have a reference data for google/flan-t5-xl lambada_openai, please kindly share with me. Thanks!
{'input_ids': tensor([[ 105, 6936, 2298, 29715, 9439, 1239, 37, 388, 3993, 26,
44, 376, 5, 19783, 737, 22, 17, 43, 3, 9,
11354, 125, 47, 352, 30, 5, 216, 2299, 12, 112,
2743, 5, 105, 3696, 51, 270, 22, 7, 131, 2301,
139, 8, 629, 44, 8, 2007, 13, 8, 9956, 1239,
105, 15046, 269, 1239, 105, 8952, 670, 192, 6, 2087,
386, 6, 2286, 550, 642, 3059, 243, 11, 3993, 26,
44, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
(Pdb) p targets_tokens
{'input_ids': tensor([[19783, 1]]), 'attention_mask': tensor([[1, 1]])}
...
(Pdb) p greedy_tokens
tensor([19783, 5])
(Pdb) p target_tokens
tensor([19783, 1])
The text was updated successfully, but these errors were encountered: