Fail gracefully upon tokenizer logging failure (#2035) #2038

haileyschoelkopf · 2024-06-28T17:31:02Z

closes #2035 .

If adding detailed tokenizer info to logs failed for whatever reason, previously, this would result in the run erroring out before saving results or printing the results table.

This PR causes tokenizer info to simply not be logged when a failure occurs.

LSinev · 2024-06-29T06:35:52Z

May I suggest using some sort of loop to check and fill dict with tokenization info for such case. Because in #2035 problem is that not all tokens defined in tokenizer, other info may be good. So partial logging may be done.
May be some loop with storage.update for each successfull step.
Or maybe [getattr(lm.tokenizer, "pad_token", None), getattr(lm.tokenizer, "pad_token_id", None)] if supported.

cc @artemorloff as original author of #1731 changes

fail gracefully upon tokenizer logging failure

6897980

haileyschoelkopf requested a review from lintangsutawika as a code owner June 28, 2024 17:31

lintangsutawika approved these changes Jun 29, 2024

View reviewed changes

lintangsutawika merged commit 2a6acc8 into main Jun 29, 2024
9 checks passed

lintangsutawika deleted the 2035-fail-tokenizer-info branch June 29, 2024 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail gracefully upon tokenizer logging failure (#2035) #2038

Fail gracefully upon tokenizer logging failure (#2035) #2038

haileyschoelkopf commented Jun 28, 2024

LSinev commented Jun 29, 2024 •

edited

Loading

Fail gracefully upon tokenizer logging failure (#2035) #2038

Fail gracefully upon tokenizer logging failure (#2035) #2038

Conversation

haileyschoelkopf commented Jun 28, 2024

LSinev commented Jun 29, 2024 • edited Loading

LSinev commented Jun 29, 2024 •

edited

Loading