Long time testing Qwen2-72B #1984

djstrong · 2024-06-18T09:06:28Z

When testing Qwen/Qwen2-72B parallelize=True,max_length=4096 on generate_until task I am getting warnings:

Both `max_new_tokens` (=2048) and `max_length`(=882) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)

Number after max_length is getting lower in time.

The problem is the test is taking few times more time than e.g. using llama3-70B. In the task there is max_gen_toks: 50.

Is the long time related to the warning above?

I see that this model has "max_new_tokens": 2048 in generation_config.json, but llama3 does not have.

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-06-19T20:31:37Z

There have been a couple reports of odd scores or behavior when evaluating Qwen2 models. I hope to check this out soonish, if others aren't able to.

baberabb · 2024-06-28T11:02:03Z

Looks like max_new_tokens=2048 is taking precedence over max_tokens(len(inputs) + max_gen_toks)

haileyschoelkopf added the bug Something isn't working. label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long time testing Qwen2-72B #1984

Long time testing Qwen2-72B #1984

djstrong commented Jun 18, 2024 •

edited

Loading

haileyschoelkopf commented Jun 19, 2024

baberabb commented Jun 28, 2024

Long time testing Qwen2-72B #1984

Long time testing Qwen2-72B #1984

Comments

djstrong commented Jun 18, 2024 • edited Loading

haileyschoelkopf commented Jun 19, 2024

baberabb commented Jun 28, 2024

djstrong commented Jun 18, 2024 •

edited

Loading