Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose special_tokens in the API #1

Closed
darknoon opened this issue Feb 23, 2023 · 0 comments · Fixed by #2
Closed

Expose special_tokens in the API #1

darknoon opened this issue Feb 23, 2023 · 0 comments · Fixed by #2

Comments

@darknoon
Copy link

If you look at the base code, this is their example code:

cl100k_base = tiktoken.get_encoding("cl100k_base")

# In production, load the arguments directly instead of accessing private attributes
# See openai_public.py for examples of arguments for specific encodings
enc = tiktoken.Encoding(
    # If you're changing the set of special tokens, make sure to use a different name
    # It should be clear from the name what behaviour to expect.
    name="cl100k_im",
    pat_str=cl100k_base._pat_str,
    mergeable_ranks=cl100k_base._mergeable_ranks,
    special_tokens={
        **cl100k_base._special_tokens,
        "<|im_start|>": 100264,
        "<|im_end|>": 100265,
    }
)

We need more or less the same thing (ability to pass in custom tokens and their corresponding IDs) for our app. Currently don't care about pat_str or mergeable_ranks.

@dqbd dqbd closed this as completed in #2 Feb 23, 2023
dqbd pushed a commit that referenced this issue Mar 1, 2023
Improve error handling in JNI functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant