Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncation #1426

Open
mdocekal opened this issue Feb 13, 2024 · 3 comments
Open

Truncation #1426

mdocekal opened this issue Feb 13, 2024 · 3 comments
Labels
asking questions For asking for clarification / support on library usage.

Comments

@mdocekal
Copy link

Is it possible to change truncation strategy?
For example let's say that I want to remove whole few-shot sample or truncate each few-shot sample from left/right by a fair amount of tokens.

@MFajcik
Copy link

MFajcik commented Feb 14, 2024

To add more context, with @mdocekal we found that harness truncates task description, when having n-shot prompt. At least for GPT-2-XL and accelerate model. We would like to truncate the content of "shots" (so if it is 10-shot, and it won't fit, we want to change the particular example to e.g., 9-shot, or truncate from the last example, and not to truncate the preceding task description).

@baberabb
Copy link
Contributor

baberabb commented Feb 14, 2024

Might be a bit tricky. The task description is prepended to the fully constructed fewshot string here:

labeled_examples = self.config.description + self.sampler.get_context(

If you only care about a specific model, one way could be to use a custom sampler and override the get_context method to condition the few-shots how you want by adding a tokenizer.

@haileyschoelkopf haileyschoelkopf added the asking questions For asking for clarification / support on library usage. label Feb 19, 2024
@MFajcik
Copy link

MFajcik commented Feb 26, 2024

we found our "hacky" way to do what we wanted here. The question remains whether we should try, implement and pull request such a thing into lm-harness. We are developing a benchmark, and were hoping people could use harness for its evaluation.

Do you think the truncation strategy could be specified with user function in yaml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking questions For asking for clarification / support on library usage.
Projects
None yet
Development

No branches or pull requests

4 participants