Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail gracefully upon tokenizer logging failure (#2035) #2038

Merged
merged 1 commit into from
Jun 29, 2024

Conversation

haileyschoelkopf
Copy link
Contributor

closes #2035 .

If adding detailed tokenizer info to logs failed for whatever reason, previously, this would result in the run erroring out before saving results or printing the results table.

This PR causes tokenizer info to simply not be logged when a failure occurs.

@LSinev
Copy link
Contributor

LSinev commented Jun 29, 2024

May I suggest using some sort of loop to check and fill dict with tokenization info for such case. Because in #2035 problem is that not all tokens defined in tokenizer, other info may be good. So partial logging may be done.
May be some loop with storage.update for each successfull step.
Or maybe [getattr(lm.tokenizer, "pad_token", None), getattr(lm.tokenizer, "pad_token_id", None)] if supported.

cc @artemorloff as original author of #1731 changes

@lintangsutawika lintangsutawika merged commit 2a6acc8 into main Jun 29, 2024
9 checks passed
@lintangsutawika lintangsutawika deleted the 2035-fail-tokenizer-info branch June 29, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The problem of generate responses with my own trained model
3 participants