Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPM package #22

Open
atgctg opened this issue Jan 22, 2023 · 7 comments
Open

NPM package #22

atgctg opened this issue Jan 22, 2023 · 7 comments

Comments

@atgctg
Copy link

atgctg commented Jan 22, 2023

Would it be possible to add a wasm target and make tiktoken available for Node.js projects?

I'm currently relying on gpt-3-encoder but would prefer to use tiktoken for performance reasons.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Feb 1, 2023

Thanks for the suggestion! I'm not currently planning on implementing this, but it is likely that at some point we will.

If other people encountering this also have this feature request, please thumbs up the original post.

Note that the third party gpt-3-encoder library will not work exactly right for any of the Codex or GPT-3.5 series models, including code-cushman-001, code-davinci-002, text-davinci-002, text-davinci-003, etc. It will also not work at all for e.g. the recent embeddings models, like text-embedding-ada-002

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Feb 3, 2023

Just published a new version of tiktoken that includes a mapping from model to tokeniser. Anything not using r50k is liable to be incorrect (sometimes subtly, sometimes majorly, sometimes majorly but you won't notice) with the third party gpt-3-encoder library: https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L7

@dqbd
Copy link

dqbd commented Feb 19, 2023

Hello, I've been working on JS bindings for tiktoken, found here: https://github.com/dqbd/tiktoken. Core methods are implemented, some methods are missing for now.

You can install it as such:

npm install tiktoken

If it is desired, I could try to wrangle the changes and create an upstream PR.

Note: Couldn't secure the tiktoken NPM package name, as it is currently owned by @gmpetrov

EDIT: Pure JS port is also available as

npm install js-tiktoken

@ceifa
Copy link

ceifa commented Mar 16, 2023

Hey, I also made a simple alternative for tiktoken on Node.js! A lot of features are missing too, but I plan to make a 1:1 library with the python version.

Link: https://github.com/ceifa/tiktoken-node

Because it relies on node addon instead of webassembly, it is 5-6x faster than @dqbd approach.

@Cainier
Copy link

Cainier commented Apr 17, 2023

If you just want to get tokens and USD consumed by messages,you can try it : )

npm install gpt-tokens
import { GPTTokens } from 'gpt-tokens'

const gptTokens = new GPTTokens({
    model   : 'gpt-3.5-turbo',
    messages: [
        { 'role': 'system', 'content': 'You are a helpful assistant' },
        { 'role': 'user', 'content': '' },
    ],
})

// 18
console.log('Tokens: ', gptTokens.usedTokens)
// 0.000036
console.log('USD: ', gptTokens.usedUSD)

@metaskills
Copy link

Could someone help me understand the pros and cons of using Xenova/text-embedding-ada-002 with Transformers.js vs one of the other project listed above?

@seyfer
Copy link

seyfer commented Sep 25, 2023

there is another one https://github.com/niieani/gpt-tokenizer
that seems like a full Node port, without a wrapper around Python tiktoker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants