-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load pre-trained tokenizer from memory #1013
Comments
This is a totally reasonable thing to ask. Actually if you are in pure Rust, it already supports serde so
That being said adding a new function is definitely something we could do. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello guys,
tokenizers::Tokenizer
has two methods to create a pre-trained tokenizer:Tokenizer::from_pretrained
andTokenizer::from_file
. It's quite common that we need to downloadtokenizer.json
from remote, now we have to save the remote data to a local file and then load the tokenizer from local file. We also have to delete it after loading.If we have something like
Tokenizer::from_memory(data_bytes: &[u8])
, we can just download the remote tokenizer to memory and then load it directly. It's quick and safe, you'll never need to delete the local file after using.The text was updated successfully, but these errors were encountered: