Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

Open
daisyden opened this issue Nov 22, 2023 · 11 comments
Open

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

daisyden opened this issue Nov 22, 2023 · 11 comments

Comments

@daisyden
Copy link

daisyden commented Nov 22, 2023

When we run lambada_openai on google/flan-t5-xl, both input token and labels are end with EOS because by default add_special_tokens=True for Seq2Seq model, however the output of the model_call does not have EOS and the accuracy is always 0. As lambada dataset input is not a full sentence, can we set add_special_tokens=False to run lambada for t5 models? Or please help to suggest how to get correct result on lambada task for t5 models.

If you have a reference data for google/flan-t5-xl lambada_openai, please kindly share with me. Thanks!

task_dict = tasks.get_task_dict(task_names)
model = models.huggingface.AutoSeq2SeqLM(args.model,device=args.device, batch_size=1)
results = evaluator.evaluate(
    model,
    task_dict,
    limit=100
)

{'input_ids': tensor([[ 105, 6936, 2298, 29715, 9439, 1239, 37, 388, 3993, 26,
44, 376, 5, 19783, 737, 22, 17, 43, 3, 9,
11354, 125, 47, 352, 30, 5, 216, 2299, 12, 112,
2743, 5, 105, 3696, 51, 270, 22, 7, 131, 2301,
139, 8, 629, 44, 8, 2007, 13, 8, 9956, 1239,
105, 15046, 269, 1239, 105, 8952, 670, 192, 6, 2087,
386, 6, 2286, 550, 642, 3059, 243, 11, 3993, 26,
44, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
(Pdb) p targets_tokens
{'input_ids': tensor([[19783, 1]]), 'attention_mask': tensor([[1, 1]])}
...
(Pdb) p greedy_tokens
tensor([19783, 5])
(Pdb) p target_tokens
tensor([19783, 1])

@daisyden daisyden changed the title Question about the tokenizer add_special_tokens parameter for t5 model lambada task Can we set tokenizer add_special_tokens=False for t5 model lambada task? Nov 22, 2023
@daisyden daisyden changed the title Can we set tokenizer add_special_tokens=False for t5 model lambada task? The tokenizer add_special_tokens parameter for t5 model lambada task Nov 22, 2023
@StellaAthena
Copy link
Member

This seems pretty reasonable to me. Do you get expected results with the flag set false?

@daisyden
Copy link
Author

daisyden commented Nov 25, 2023

Hi @StellaAthena, the ppl and accuracy of google/flan-t5-xl on lambada I got
with add_special_tokens=False is

Task Version Metric Value Stderr
lambada_openai 0 ppl 360.4850 ± 28.7851
acc 0.2987 ± 0.0064

and with add_special_tokens=True is

Task Version Metric Value Stderr
lambada_openai 0 ppl 913.6121 ± 40.5159
acc 0.0076 ± 0.0012

However, I cannot find the expected lambada accuracy and ppl from model card https://huggingface.co/google/flan-t5-xl and paper https://arxiv.org/pdf/2210.11416.pdf. Since lambada is a part of the finetune dataset seen from model card, 29.8% accuracy is still very low. If you have the SOTA of google/flan-t5-xl on lambada please share with me. Thanks a lot!

@StellaAthena
Copy link
Member

I don't have any information on this. As far as I am aware this is the correct value. If you want to study further, you can examine the per-example generations and see if you see anything weird.

@daisyden
Copy link
Author

Thanks @StellaAthena, do you mean to call generate and check the output? I will have a try. I also sent an email to t5 author, hope we can get a feedback.

@StellaAthena
Copy link
Member

Yes, you can see how to do this in the eval harness here

@daisyden
Copy link
Author

Checked with google/flan-t5-xl author, the recommended way to run lambada on this model is to append EOS at the end of input and targets in _model_call(), while when we compute word accuracy and word perplexity based on outputs we can just compare the last word only and ignore the EOS.
@StellaAthena to implement this, we could need to set add_special_tokens=True and customize the _loglikelihood_tokens function to skip the EOS when compute ppl and accuracy, what is is your suggestions?

@milliemaoo
Copy link

milliemaoo commented Nov 28, 2023

I meet some problems when using metric 'exact match' on t5-XXL or t5-Xl as well (BBH evaluation, got low-quality generation but have no idea so far)

@wangyanbao666
Copy link

@daisyden Hi, may I know how did you solve the issue? I'm trying to run lambada evaluation on t5-base and got the same issue. The perplexity is extremely high and accuracy is almost 0.

@djstrong
Copy link
Contributor

djstrong commented Mar 18, 2024

Me too. mt5-xl has very high word-level perplexity.

@haileyschoelkopf
Copy link
Contributor

@lintangsutawika is traveling this week but might have thoughts on this when he's back as he's worked a lot with T5 and T5-like models!

Separately to possible issues re: ppl computation or special tokens, though, are the T5 models you are evaluating on trained to perform L-to-R language modeling @wangyanbao666 @djstrong ? I'd expect a T5 model trained only on span denoising / MLM to perform quite poorly on language modeling tasks like lambada or wikitext/perplexities.

@djstrong
Copy link
Contributor

They are MLM models so maybe it doesn't make sense or it needs to add MASK token add the end of each sub sequence. I will remove these models from calculating perplexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants