Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tiktokenizer link in "How to count tokens" #604

Merged
merged 1 commit into from
Aug 28, 2023

Conversation

EliahKagan
Copy link
Contributor

This adds a link to the popular Tiktokenizer webapp, in the section of the "How to count tokens with tiktoken" notebook that talks about the OpenAI Tokenizer. It retains the reference to the OpenAI Tokenizer as well.

Rationale:

  • The tiktoken FAQ recommends Tiktokenizer as a resource, in the first line under "Usage help". Besides indicating that it is a helpful resource, this also suggests that OpenAI is already willing to direct users to it.
  • Tiktokenizer supports some important encodings the OpenAI Tokenizer currently does not, such as cl100k_base. This allows users to see how text is tokenized for chat models like gpt-3.5-turbo and gpt-4, and for the embedding model text-embedding-ada-002. In contrast, the OpenAI Tokenizer currently only supports GPT-3 and Codex.

(I am not affiliated in any way with Tiktokenizer. I learned about it a while ago from the tiktoken FAQ, and I've often found it useful.)

I'm unsure what the best wording is here, but I phrased it "or the third-party Tiktokenizer webapp" to so no readers are misled into thinking Tiktokenizer is itself developed by OpenAI.

This adds a link to Tiktokenizer webapp as another tool, in
addition to the OpenAI Tokenizer.
@ted-at-openai
Copy link
Collaborator

Good suggestion. Asking the owner of tiktokenizer here: dqbd/tiktokenizer#13

@ted-at-openai ted-at-openai self-assigned this Jul 24, 2023
@ted-at-openai ted-at-openai self-requested a review July 24, 2023 22:58
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Jul 24, 2023

Good idea--thanks! (I look forward to hearing how the author feels about it.)

@dqbd
Copy link

dqbd commented Aug 7, 2023

Hey! Glad to see that people find Tiktokenizer still useful and I'm fine with keeping the website alive for the foreseeable future as well as keeping the reference to the tool in the guide.

@ted-at-openai ted-at-openai merged commit 63f9515 into openai:main Aug 28, 2023
@EliahKagan EliahKagan deleted the tiktokenizer branch August 28, 2023 19:43
katia-openai pushed a commit that referenced this pull request Feb 29, 2024
This adds a link to Tiktokenizer webapp as another tool, in
addition to the OpenAI Tokenizer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants