Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update script to count tokens #10

Merged
merged 1 commit into from
May 24, 2023
Merged

Update script to count tokens #10

merged 1 commit into from
May 24, 2023

Conversation

polm-stability
Copy link
Collaborator

This is pretty messy, but handles a variety of configurations not previously supported and fixes a few issues.

In particular, the T5Tokenizer adds an end of sentence token to every input, so if you just count the IDs, you'll get one extra ID per line, which affects the token count.

This is pretty messy, but handles a variety of configurations and fixes
a few issues.
@polm-stability polm-stability assigned ghost May 24, 2023
@polm-stability polm-stability requested a review from a user May 24, 2023 06:43
@polm-stability polm-stability unassigned ghost May 24, 2023
Copy link

@mkshing mkshing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polm-stability LGTM! Thanks.

P.S. please re-assign @mkshing, not @mktshing :)

@leemengtw
Copy link
Collaborator

Let's merge it!

@leemengtw leemengtw merged commit da2a71b into Stability-AI:jp-stable May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants