'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

tsw123678 · 2024-05-09T11:30:27Z

when i run llama3-finetune-lora script, i encounter the follow error：

Original Traceback (most recent call last):
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/x/tsw/LLaVA/llava/train/train.py", line 828, in getitem
data_dict = preprocess(
File "/x/tsw/LLaVA/llava/train/train.py", line 699, in preprocess
return preprocess_v1(sources, tokenizer, has_image=has_image)
File "/x/tsw/LLaVA/llava/train/train.py", line 534, in preprocess_v1
if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'legacy'

my version info is:
transformers 4.41.0.dev0
tokenizers 0.19.1

ashmalvayani · 2024-05-27T14:29:38Z

What was the solution for this error?

tsw123678 · 2024-05-28T05:14:04Z

What was the solution for this error?

I am so sorry，the issue is too old that i have forget the reason，maybe you should load tokenizer with use_fast or not...

ashmalvayani · 2024-05-28T05:41:15Z

What was the solution for this error?

I am so sorry，the issue is too old that i have forget the reason，maybe you should load tokenizer with use_fast or not...

I am currently using a cohere's model and when you set use_fast =True, it will throw a "CohereTokenizer does not exist or is not currently supported" error even with the latest version of the transformers so I think it's already set as True by default by AutoTokenizer.

tsw123678 closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

tsw123678 commented May 9, 2024

ashmalvayani commented May 27, 2024

tsw123678 commented May 28, 2024

ashmalvayani commented May 28, 2024

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

Comments

tsw123678 commented May 9, 2024

ashmalvayani commented May 27, 2024

tsw123678 commented May 28, 2024

ashmalvayani commented May 28, 2024