Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"+++" continuous writing exception #85

Closed
walkrunning opened this issue Jun 2, 2023 · 5 comments
Closed

"+++" continuous writing exception #85

walkrunning opened this issue Jun 2, 2023 · 5 comments

Comments

@walkrunning
Copy link

The novel model "+++" is not output after several times of writing, and other instructions are not output. I can only reset the model.
decoded: str = tokenizer.decode(accumulated_tokens)
if '\uFFFD' not in decoded:

@BlinkDL
Copy link

BlinkDL commented Jun 2, 2023

usually it means the model has decided you have reached [endoftext]

@LoganDark
Copy link
Contributor

LoganDark commented Jun 3, 2023

Most implementations with this model of encoding, decoding, and reverse prompt (myself included) are incorrect. Implementations of encoding will become almost correct with World's greedy tokenizer, but decoding and reverse prompt will still be buggy. I hope to document how to do this properly in the future

@saharNooby
Copy link
Collaborator

Most implementations with this model of encoding, decoding, and reverse prompt (myself included) are incorrect

Hm, why, given that official tokenizers Python library is used? Or do you mean only these if statements like some_byte_sequence in string?

@LoganDark
Copy link
Contributor

Most implementations with this model of encoding, decoding, and reverse prompt (myself included) are incorrect

Hm, why, given that official tokenizers Python library is used? Or do you mean only these if statements like some_byte_sequence in string?

implementers forget that sometimes encode(str1) + encode(str2) != encode (str1 + str2)

@walkrunning
Copy link
Author

usually it means the model has decided you have reached [endoftext]

Haha, da Lao. I think you are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants