Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with model without tokenizer.json? #169

Open
sanmeow opened this issue May 19, 2023 · 0 comments
Open

How to deal with model without tokenizer.json? #169

sanmeow opened this issue May 19, 2023 · 0 comments

Comments

@sanmeow
Copy link

sanmeow commented May 19, 2023

Hello! Thanks for the community for this great work!

I came to a problem that some model can't transfer to GGML.

When the GPT-NeoX model creator made tokenizer with sentencepiece,
examples/gpt-neox/convert-h5-to-ggml.py
it will say I don't have tokenizer.json

for example:
https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft

I got it half solved by use
https://github.com/togethercomputer/redpajama.cpp

and works! So there is some way to solve the sentencepiece problem.(Study a bit that tokenizer was base on T5 not GPTNeoX)

So is there anyway to modify it to this GGML? There are other model facing similar problem.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant