Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error ‘assert len(chunk_encode) == 2’ when eval MMMU_DEV_VAL #173

Closed
wutaiqiang opened this issue Apr 25, 2024 · 3 comments
Closed

Error ‘assert len(chunk_encode) == 2’ when eval MMMU_DEV_VAL #173

wutaiqiang opened this issue Apr 25, 2024 · 3 comments

Comments

@wutaiqiang
Copy link

wutaiqiang commented Apr 25, 2024

When I try to test MMMU_DEV_VAL for LlaVa

The assert len(chunk_encode) == 2 is triggered, after print, I find that len(chunk_encode)=4

Since $inputs$ is:

<|start_header_id|>user<|end_header_id|>
[image]
Determine the area in hectares between the line AB and a meandering stream for offsets taken at a regular interval of 20 m along the line AB (Fig. 12.5). Use both the trapezoidal rule and Simpson's rule.
[Image]
[Image]

A. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5560 hectares
B. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5460 hectares
C. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5260 hectares
D. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5360 hectares
Answer with the option's letter from the given choices directly.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

After split input with [image], we would get 4 parts more than 4, and the assert is triggered.

@wutaiqiang
Copy link
Author

seems that error come from:

prompt = '\n'.join([x['value'] if x['type'] == 'text' else '<image>' for x in message])

@wutaiqiang
Copy link
Author

There are multi-image in the prompt, and prompt is:
Determine the area in hectares between the line AB and a meandering stream for offsets taken at a regular interval of 20 m along the line AB (Fig. 12.5). Use both the trapezoidal rule and Simpson's rule.
[Image]
[Image]

A. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5560 hectares
B. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5460 hectares
C. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5260 hectares
D. Use trapezoidal rule,area = 0.5010 hectares;Use trapezoidal rule,area = 0.5360 hectares
Answer with the option's letter from the given choices directly.

seems that image should be discarded.

@wutaiqiang
Copy link
Author

seems that error come from:

prompt = '\n'.join([x['value'] if x['type'] == 'text' else '<image>' for x in message])

Since Model LLaVA_XTuner does not support interleaved input. Will use the first image and aggregated texts as prompt.

Modifying the 'image' into another name would solve this issue. and such images would be discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant