You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the playground token counting is misleading because many models don't provide their tokenizer so we can't know the token ids and sometimes not even the length.
For example anthropic allows knowing the number of tokens of a string but not their IDs and \t\t counts as 1 token for Claude but 2 for openai's gpt-3.5 and 4 models. For some python code that can make a huge difference!
I think it might be best to print to the user if the modelname used does not correspond to the tokenizer used, just before printing the token info, instead of just telling a token count without disclaimer.
The text was updated successfully, but these errors were encountered:
Hi,
I think the playground token counting is misleading because many models don't provide their tokenizer so we can't know the token ids and sometimes not even the length.
For example anthropic allows knowing the number of tokens of a string but not their IDs and \t\t counts as 1 token for Claude but 2 for openai's gpt-3.5 and 4 models. For some python code that can make a huge difference!
I think it might be best to print to the user if the modelname used does not correspond to the tokenizer used, just before printing the token info, instead of just telling a token count without disclaimer.
The text was updated successfully, but these errors were encountered: