[InstructBLIP] qformer_tokenizer is required input #33222

amyeroberts · 2024-08-30T17:52:06Z

What does this PR do?

At the moment, InstructBLIP doesn't have qformer_tokenizer listed as one of its attributes. This means that the processor is not currently compatible with tests using the ProcessorTesterMixin, as the component won't be automatically loaded when constructing the processor.

As the assumption that processor.attributes lists all the processing classes for the processor I believe is a reasonable one, I've modified the processor here to include it.

This does require some hacky logic for when the processor is saved, to avoid overwriting the tokenizer configs and still saving the qformer_tokenizer configs to a seaprate subfolder.

For future processors, I think it would be better for us to handle this more cleanly such that many processing classes of the same type can be bundled and saved together easily and automatically. For example, having all the qformer_tokenzer files saved with a qformer_ prefix on the top level. This would enable e.g. having image processors and video processors saved together too.

This isn't possible for this processor because of backward compatibility, but we can consider what it would look like in the future.

HuggingFaceDocBuilderDev · 2024-08-30T18:12:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik · 2024-09-02T11:31:58Z

LMK when you want a review Amy!

amyeroberts · 2024-09-03T13:30:19Z

@LysandreJik Yes please! Would also like a review from @zucchini-nlp, who I discussed this with and who has been doing lots of work with the processors recently

zucchini-nlp

Thanks, looks good to me! We also have InstructBlipVideo which has the same processor, can you propagate changes there pls?

zucchini-nlp · 2024-09-03T14:28:59Z

tests/models/instructblip/test_processor_instructblip.py

+        all_kwargs = {
+            "common_kwargs": {"return_tensors": "pt"},
+            "images_kwargs": {"size": {"height": 214, "width": 214}},
+            "text_kwargs": {"padding": "max_length", "max_length": 76},
+        }
+


Might be missing smth here, I am not sure how these tests are passing if InstructBlip hasn't standardized the kwargs. If these pass without standardization, so prob the tests are not written properly and we need to write better ones in another PR 🤔

EDIT: My bad, just noticed the skip_processor_without_typed_kwargs thing which skips tests hehe

amyeroberts · 2024-09-04T10:27:01Z

@zucchini-nlp Done! Added processor tests for InstructBlipVideoProcessor too

zucchini-nlp

LGTM, thanks for adding instructBlipVideo

zucchini-nlp · 2024-09-04T10:53:28Z

tests/models/instructblipvideo/test_processor_instructblipvideo.py

@@ -0,0 +1,428 @@
+# Copyright 2023 The HuggingFace Team. All rights reserved.


nit: 2024 I guess :)

zucchini-nlp · 2024-09-04T10:58:18Z

tests/models/instructblipvideo/test_processor_instructblipvideo.py

+    # Ignore copy
+    def prepare_image_inputs(self):
+        """This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,


Would be nice to test video inputs as well, but we can work on it later. Maybe I'll have to do some pre-standardization on videos first and then add proper tests everywhere

I updated so that this method passes in a list of list of frames

tests/models/instructblipvideo/test_processor_instructblipvideo.py

…o.py

LysandreJik

Thanks @amyeroberts! Nice tests

LysandreJik · 2024-09-04T14:45:22Z

src/transformers/models/instructblip/processing_instructblip.py

+        qformer_present = "qformer_tokenizer" in self.attributes
+        if qformer_present:
+            self.attributes.remove("qformer_tokenizer")
+
+        outputs = super().save_pretrained(save_directory, **kwargs)
+
+        if qformer_present:
+            self.attributes += ["qformer_tokenizer"]
+        return outputs


Out of curiosity, why don't we want the qformer_tokenizer to be saved here?

What happens in processor.save_pretrained is that it iterates over the classes listed in processor.attributes. In the case of qformer_tokenizer, if it was saved out, its tokenizer files e.g. tokenizer.json would just overwrite the files of the tokenizer class earlier in the list. To avoid this, a workaround was done, such that the qformer_tokenizer was saved to a subfolder e.g. like here, and just the tokenizer was saved to the top-level of the checkpoint.

* [InstructBLIP] qformer_tokenizer is required input * Bit safer * Add to instructblipvideo processor * Fix up * Use video inputs * Update tests/models/instructblipvideo/test_processor_instructblipvideo.py

[InstructBLIP] qformer_tokenizer is required input

160db5a

Bit safer

a990722

amyeroberts requested a review from zucchini-nlp September 3, 2024 13:29

zucchini-nlp reviewed Sep 3, 2024

View reviewed changes

amyeroberts added 2 commits September 4, 2024 10:41

Add to instructblipvideo processor

246c6f0

Fix up

70f46a6

amyeroberts requested a review from zucchini-nlp September 4, 2024 10:27

zucchini-nlp approved these changes Sep 4, 2024

View reviewed changes

Use video inputs

4a005b5

amyeroberts commented Sep 4, 2024

View reviewed changes

tests/models/instructblipvideo/test_processor_instructblipvideo.py Outdated Show resolved Hide resolved

Update tests/models/instructblipvideo/test_processor_instructblipvide…

183b6b5

…o.py

amyeroberts requested a review from LysandreJik September 4, 2024 12:09

LysandreJik approved these changes Sep 4, 2024

View reviewed changes

amyeroberts merged commit d2dcff9 into huggingface:main Sep 4, 2024
21 checks passed

amyeroberts deleted the fix-instructblipprocessor branch September 4, 2024 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstructBLIP] qformer_tokenizer is required input #33222

[InstructBLIP] qformer_tokenizer is required input #33222

amyeroberts commented Aug 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 30, 2024

LysandreJik commented Sep 2, 2024

amyeroberts commented Sep 3, 2024

zucchini-nlp left a comment

zucchini-nlp Sep 3, 2024 •

edited

Loading

amyeroberts commented Sep 4, 2024

zucchini-nlp left a comment

zucchini-nlp Sep 4, 2024

zucchini-nlp Sep 4, 2024

amyeroberts Sep 4, 2024 •

edited

Loading

LysandreJik left a comment

LysandreJik Sep 4, 2024

amyeroberts Sep 4, 2024

		@@ -0,0 +1,428 @@
		# Copyright 2023 The HuggingFace Team. All rights reserved.

[InstructBLIP] qformer_tokenizer is required input #33222

[InstructBLIP] qformer_tokenizer is required input #33222

Conversation

amyeroberts commented Aug 30, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 30, 2024

LysandreJik commented Sep 2, 2024

amyeroberts commented Sep 3, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

amyeroberts commented Sep 4, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp Sep 4, 2024

Choose a reason for hiding this comment

zucchini-nlp Sep 4, 2024

Choose a reason for hiding this comment

amyeroberts Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Sep 4, 2024

Choose a reason for hiding this comment

amyeroberts Sep 4, 2024

Choose a reason for hiding this comment

amyeroberts commented Aug 30, 2024 •

edited

Loading

zucchini-nlp Sep 3, 2024 •

edited

Loading

amyeroberts Sep 4, 2024 •

edited

Loading