Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage with BigBird-Roberta-Base #36

Open
jordanparker6 opened this issue Feb 3, 2023 · 1 comment
Open

Usage with BigBird-Roberta-Base #36

jordanparker6 opened this issue Feb 3, 2023 · 1 comment

Comments

@jordanparker6
Copy link

jordanparker6 commented Feb 3, 2023

Would it be possible to use LiLT with BigBird-Roberta-Base models?

If so, any feedback on the best approach of doing so? What might need changing in the LiLT repository to do so?

https://huggingface.co/google/bigbird-roberta-base

@jordanparker6
Copy link
Author

I was able to use the provided script to create a lilt-roberta-base-en using the following: https://huggingface.co/google/bigbird-roberta-base. If I can get this working, I will post up to HuggingfaceHub.

BigBird uses the same tokenizer as roberta so no issue with tokenizationgoogle/bigbird-roberta-base.

However, the following error occurs when loading the model.

RuntimeError: Error(s) in loading state_dict for LiltForTokenClassification: size mismatch for lilt.layout_embeddings.box_position_embeddings.weight: copying a param with shape torch.Size([514, 192]) from checkpoint, the shape in current model is torch.Size([4096, 192]). You may consider adding ignore_mismatched_sizes=Truein the modelfrom_pretrained method.

I think this error is created when the pytorch state dicts are fused with the following line.

total_model = {**text_model, **lilt_model}

The lilt_model dim changes the incoming bigbird dim.

Would it be problematic to switch this:

total_model = {**lilt_model, **text_model }

Or would this break the architecture?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant