Don't left truncate stuff anymore #239

norabelrose · 2023-05-03T05:49:29Z

Left truncation was a terrible idea, idk why I ever thought it made sense.

Some models, in particular unifiedqa-t5-11b, have unusually short context lengths so that a significant fraction (e.g. 20%) of prompts just get truncated from the left, potentially removing important info about the task. This seems to be leading to degraded performance.

This PR fixes the problem by simply skipping examples in extract_hiddens which exceed the max length indicated by the tokenizer.

Don't left truncate stuff anymore

c6821ba

norabelrose requested a review from AlexTMallen May 3, 2023 05:49

AlexTMallen approved these changes May 3, 2023

View reviewed changes

norabelrose merged commit 8ba18c3 into main May 3, 2023

norabelrose deleted the no-truncation branch May 3, 2023 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't left truncate stuff anymore #239

Don't left truncate stuff anymore #239

norabelrose commented May 3, 2023

Don't left truncate stuff anymore #239

Don't left truncate stuff anymore #239

Conversation

norabelrose commented May 3, 2023