The tokenizer add_special_tokens parameter for t5 model lambada task #1017

daisyden · 2023-11-22T11:47:04Z

When we run lambada_openai on google/flan-t5-xl, both input token and labels are end with EOS because by default add_special_tokens=True for Seq2Seq model, however the output of the model_call does not have EOS and the accuracy is always 0. As lambada dataset input is not a full sentence, can we set add_special_tokens=False to run lambada for t5 models? Or please help to suggest how to get correct result on lambada task for t5 models.

If you have a reference data for google/flan-t5-xl lambada_openai, please kindly share with me. Thanks!

task_dict = tasks.get_task_dict(task_names)
model = models.huggingface.AutoSeq2SeqLM(args.model,device=args.device, batch_size=1)
results = evaluator.evaluate(
    model,
    task_dict,
    limit=100
)

{'input_ids': tensor([[ 105, 6936, 2298, 29715, 9439, 1239, 37, 388, 3993, 26,
44, 376, 5, 19783, 737, 22, 17, 43, 3, 9,
11354, 125, 47, 352, 30, 5, 216, 2299, 12, 112,
2743, 5, 105, 3696, 51, 270, 22, 7, 131, 2301,
139, 8, 629, 44, 8, 2007, 13, 8, 9956, 1239,
105, 15046, 269, 1239, 105, 8952, 670, 192, 6, 2087,
386, 6, 2286, 550, 642, 3059, 243, 11, 3993, 26,
44, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
(Pdb) p targets_tokens
{'input_ids': tensor([[19783, 1]]), 'attention_mask': tensor([[1, 1]])}
...
(Pdb) p greedy_tokens
tensor([19783, 5])
(Pdb) p target_tokens
tensor([19783, 1])

The text was updated successfully, but these errors were encountered:

StellaAthena · 2023-11-23T14:04:48Z

This seems pretty reasonable to me. Do you get expected results with the flag set false?

daisyden · 2023-11-25T05:26:16Z

Hi @StellaAthena, the ppl and accuracy of google/flan-t5-xl on lambada I got
with add_special_tokens=False is

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	360.4850	±	28.7851
		acc	0.2987	±	0.0064

and with add_special_tokens=True is

Task	Version	Metric	Value		Stderr
lambada_openai	0	ppl	913.6121	±	40.5159
		acc	0.0076	±	0.0012

However, I cannot find the expected lambada accuracy and ppl from model card https://huggingface.co/google/flan-t5-xl and paper https://arxiv.org/pdf/2210.11416.pdf. Since lambada is a part of the finetune dataset seen from model card, 29.8% accuracy is still very low. If you have the SOTA of google/flan-t5-xl on lambada please share with me. Thanks a lot!

StellaAthena · 2023-11-25T14:02:09Z

I don't have any information on this. As far as I am aware this is the correct value. If you want to study further, you can examine the per-example generations and see if you see anything weird.

daisyden · 2023-11-25T14:37:50Z

Thanks @StellaAthena, do you mean to call generate and check the output? I will have a try. I also sent an email to t5 author, hope we can get a feedback.

StellaAthena · 2023-11-25T16:00:45Z

Yes, you can see how to do this in the eval harness here

daisyden · 2023-11-27T09:28:43Z

Checked with google/flan-t5-xl author, the recommended way to run lambada on this model is to append EOS at the end of input and targets in _model_call(), while when we compute word accuracy and word perplexity based on outputs we can just compare the last word only and ignore the EOS.
@StellaAthena to implement this, we could need to set add_special_tokens=True and customize the _loglikelihood_tokens function to skip the EOS when compute ppl and accuracy, what is is your suggestions?

milliemaoo · 2023-11-28T18:55:42Z

I meet some problems when using metric 'exact match' on t5-XXL or t5-Xl as well (BBH evaluation, got low-quality generation but have no idea so far)

wangyanbao666 · 2024-03-14T12:37:49Z

@daisyden Hi, may I know how did you solve the issue? I'm trying to run lambada evaluation on t5-base and got the same issue. The perplexity is extremely high and accuracy is almost 0.

djstrong · 2024-03-18T16:30:13Z

Me too. mt5-xl has very high word-level perplexity.

haileyschoelkopf · 2024-03-18T18:00:48Z

@lintangsutawika is traveling this week but might have thoughts on this when he's back as he's worked a lot with T5 and T5-like models!

Separately to possible issues re: ppl computation or special tokens, though, are the T5 models you are evaluating on trained to perform L-to-R language modeling @wangyanbao666 @djstrong ? I'd expect a T5 model trained only on span denoising / MLM to perform quite poorly on language modeling tasks like lambada or wikitext/perplexities.

djstrong · 2024-03-18T18:19:26Z

They are MLM models so maybe it doesn't make sense or it needs to add MASK token add the end of each sub sequence. I will remove these models from calculating perplexity.

daisyden changed the title ~~Question about the tokenizer add_special_tokens parameter for t5 model lambada task~~ Can we set tokenizer add_special_tokens=False for t5 model lambada task? Nov 22, 2023

daisyden changed the title ~~Can we set tokenizer add_special_tokens=False for t5 model lambada task?~~ The tokenizer add_special_tokens parameter for t5 model lambada task Nov 22, 2023

joecummings mentioned this issue Mar 22, 2024

Add eleuther_eval as recipe pytorch/torchtune#549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

daisyden commented Nov 22, 2023 •

edited

StellaAthena commented Nov 23, 2023

daisyden commented Nov 25, 2023 •

edited

StellaAthena commented Nov 25, 2023

daisyden commented Nov 25, 2023

StellaAthena commented Nov 25, 2023

daisyden commented Nov 27, 2023

milliemaoo commented Nov 28, 2023 •

edited

wangyanbao666 commented Mar 14, 2024

djstrong commented Mar 18, 2024 •

edited

haileyschoelkopf commented Mar 18, 2024

djstrong commented Mar 18, 2024

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

The tokenizer add_special_tokens parameter for t5 model lambada task #1017

Comments

daisyden commented Nov 22, 2023 • edited

StellaAthena commented Nov 23, 2023

daisyden commented Nov 25, 2023 • edited

StellaAthena commented Nov 25, 2023

daisyden commented Nov 25, 2023

StellaAthena commented Nov 25, 2023

daisyden commented Nov 27, 2023

milliemaoo commented Nov 28, 2023 • edited

wangyanbao666 commented Mar 14, 2024

djstrong commented Mar 18, 2024 • edited

haileyschoelkopf commented Mar 18, 2024

djstrong commented Mar 18, 2024

daisyden commented Nov 22, 2023 •

edited

daisyden commented Nov 25, 2023 •

edited

milliemaoo commented Nov 28, 2023 •

edited

djstrong commented Mar 18, 2024 •

edited