Suboptimal Performance on Generation Tasks #1353

erenup · 2024-01-25T16:54:04Z

I find one general problem for generation tasks when I am developing the selfcheckgpt task.

The problem is that:

For each task, we can set one prompt format in task.yaml. But different models have their own system prompts or starting token. So one task can not be compared across multiple models.
For each task, we can set stopping tokens in the {"until": ["\n\n"}. But different models have their own ending tokens. So the model will generate something like this:

I think this is a general problem and more people who are using this framework should be aware of it when they are trying to compare different models in on generation task.

Thank you very much.

StellaAthena · 2024-01-25T21:13:16Z

We are currently working on supporting HF Chat Templating, see #1287

haileyschoelkopf · 2024-01-26T00:17:50Z

Thanks @erenup for the feedback! With respect to the latter point--we currently support the passing of generation hyperparameters via the --gen_kwargs CLI flag. I can see about adding the ability to provide extra stop sequences via the CLI using this flag.

erenup · 2024-01-26T08:32:00Z

Thank you very much for the quick reply!

erenup · 2024-02-17T20:21:31Z

hi @haileyschoelkopf

I think this could be a potential way to fix the huggingface.py model generation performance with apply_chat_template:
add self.tokenizer.apply_chat_template(messages, tokenize=False) to tok_batch_encode function in huggingface.py

How do you think about it?

LSinev · 2024-06-13T08:23:57Z

Is this still an issue after #1873?

erenup closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal Performance on Generation Tasks #1353

Suboptimal Performance on Generation Tasks #1353

erenup commented Jan 25, 2024

StellaAthena commented Jan 25, 2024

haileyschoelkopf commented Jan 26, 2024

erenup commented Jan 26, 2024

erenup commented Feb 17, 2024

LSinev commented Jun 13, 2024

Suboptimal Performance on Generation Tasks #1353

Suboptimal Performance on Generation Tasks #1353

Comments

erenup commented Jan 25, 2024

StellaAthena commented Jan 25, 2024

haileyschoelkopf commented Jan 26, 2024

erenup commented Jan 26, 2024

erenup commented Feb 17, 2024

LSinev commented Jun 13, 2024