Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer object has no attribute 'tokenizer' #7

Closed
Gusreis7 opened this issue Mar 23, 2023 · 1 comment
Closed

Tokenizer object has no attribute 'tokenizer' #7

Gusreis7 opened this issue Mar 23, 2023 · 1 comment

Comments

@Gusreis7
Copy link

Hi thanks for your project !
I've been trying to use your work to punctuate some audios in portuguese, but I got stuck with some problems with the Tokenizer

First I got in punctuate.py:
line 84, in init self.tokenizer = self.whisper_tokenizer.tokenizer AttributeError: 'Tokenizer' object has no attribute 'tokenizer'

By removing the .tokenizer, I got another error in punctuate.py:

line 221 tokenizer has no convert ids tokenizer.convert_ids_to_tokens

Do you have any ideia why this is happening?

@jumon
Copy link
Owner

jumon commented Apr 1, 2023

Thank you for trying out this project!

The issue you are experiencing is due to a recent change in whisper (openai/whisper#1044), which has replaced Hugging Face's tokenizer with TikToken. I will modify this repository to ensure compatibility with the latest version of Whisper.

In the meantime, as a workaround, you can use the older version of Whisper by running the following command:

pip install openai-whisper==20230308

Thank you for bringing this to my attention and please let me know if you have any further questions or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants