Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

Closed
tsw123678 opened this issue May 9, 2024 · 3 comments
Closed

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

tsw123678 opened this issue May 9, 2024 · 3 comments

Comments

@tsw123678
Copy link

when i run llama3-finetune-lora script, i encounter the follow error:

Original Traceback (most recent call last):
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/x/tsw/LLaVA/llava/train/train.py", line 828, in getitem
data_dict = preprocess(
File "/x/tsw/LLaVA/llava/train/train.py", line 699, in preprocess
return preprocess_v1(sources, tokenizer, has_image=has_image)
File "/x/tsw/LLaVA/llava/train/train.py", line 534, in preprocess_v1
if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14:
AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'legacy'

my version info is:
transformers 4.41.0.dev0
tokenizers 0.19.1

@ashmalvayani
Copy link

What was the solution for this error?

@tsw123678
Copy link
Author

What was the solution for this error?

I am so sorry,the issue is too old that i have forget the reason,maybe you should load tokenizer with use_fast or not...

@ashmalvayani
Copy link

What was the solution for this error?

I am so sorry,the issue is too old that i have forget the reason,maybe you should load tokenizer with use_fast or not...

I am currently using a cohere's model and when you set use_fast =True, it will throw a "CohereTokenizer does not exist or is not currently supported" error even with the latest version of the transformers so I think it's already set as True by default by AutoTokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants