-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WIP support for returning top tokens #617
Conversation
Could this be used to "shorten" a generation (i.e. only use the first N tokens)? If so, how would one use the logprobs to calculate a threshold for throwing out all or a portion of a generation? |
This PR only adds the ability to not just return the sampled token, but also the most likely alternative tokens. This can be useful mostly when building applications that require the logprobs of multiple tokens, like ranking and classification What you are referring to, sounds like a custom stopping criterium. |
This is very cool. Thanks for the work! |
It currently only works (properly) for |
Initial support returning the most probable tokens. Note that it is currently only implemented for seq-to-seq models. It is also always enabled, regardless of whether it is used or not.
6b29a32
to
95d0fba
Compare
@OlivierDehaene I think it's mergable now! Haven't edited the Python Client yet, but all server functionalities should work now. For some reason, my latest commit (95d0fba) doesn't show up yet in this PR, however. |
Thank you for notifying me on the licencing change. Luckily, as far as I understand it, this has no effect on our operations for now |
Mimics the behaviour of `best_of`. Also allows client compatibility with older versions
Hi @Narsil. Could you review this PR? It touches a lot of different files though most changes are only about passing parameters and results have to be passed through every communication phase. The token logic is contained in |
Thanks for the PR. I'm pretty stretched atm and this PR looks like it requires a lot of attention. |
Understandable. |
Just fixed a small oversight in the batch concatenation for flash attention models, which came up during testing |
@Narsil ping 🙂 Since it's my PR, I obviously can't review myself. But let me know if I can be of any help. |
Ok I looked at the code. On the surface the code looks really nice ! I need to invest a bit of time investigating the performance changes on various models/platforms. |
Are you OK if I do the rebase to be up-to-date too ? |
Ok here are the results from the bench: https://github.com/Narsil/tmp_bench/blob/main/README.md At least there doesn't seem to be any degradation without using Overall I find everything proposed in the PR acceptable. |
Here is my rebased version: #868 |
# What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> --------- Co-authored-by: Vincent Brouwers <[email protected]>
What does this PR do?
This PR add support for return the log probabilities of the top N tokens (besides the chosen model).
The token selection code is loaned from the IBM's fork (much thanks @njhill) and I merged into the current
main
branch. I chose to name the input parametertop_n_tokens
and it can be any (positive) integer. If it's set, atop_tokens
field in the resultdetails
will contain a LxN list of tokens where L is the sequence length and N is the amount of requested tokens.Note that it is currently very much WIP.
logprobs
, because IBM's implementation had the functionality. However, it may cause unnecessary overhead.top_n_tokens
field.Fixes #604
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed.
Specifically @OlivierDehaene and @njhill