Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-training code? #30

Open
logan-markewich opened this issue Dec 13, 2022 · 5 comments
Open

Pre-training code? #30

logan-markewich opened this issue Dec 13, 2022 · 5 comments

Comments

@logan-markewich
Copy link

Are you able to provide the pre-training code?

I would like to try and pre-train using roberta-large, or a similar language model :)

@jordanparker6
Copy link

I would like to do the same but with bigbird-roberta-en-base if possible...

@logan-markewich
Copy link
Author

If the hidden size is the same as roberta-base, you can probably use the weight generation script in the repo

@MaveriQ
Copy link

MaveriQ commented Mar 15, 2023

@logan-markewich, @jordanparker6 I am coding up the collator for the masking in the three pretraining strategies. Maybe we can work together, and share it here afterwards for everyone else to use?

@jordanparker6
Copy link

@logan-markewich, @jordanparker6 I am coding up the collator for the masking in the three pretraining strategies. Maybe we can work together, and share it here afterwards for everyone else to use?

Happy to help out as needed.

@jordanparker6
Copy link

If the hidden size is the same as roberta-base, you can probably use the weight generation script in the repo

I don't think it is... I posted my error message on a seperate issue.


I was able to use the provided script to create a lilt-roberta-base-en using the following: https://huggingface.co/google/bigbird-roberta-base. If I can get this working, I will post up to HuggingfaceHub.

BigBird uses the same tokenizer as roberta so no issue with tokenizationgoogle/bigbird-roberta-base.

However, the following error occurs when loading the model.

RuntimeError: Error(s) in loading state_dict for LiltForTokenClassification:
	size mismatch for lilt.layout_embeddings.box_position_embeddings.weight: copying a param with shape torch.Size([514, 192]) from checkpoint, the shape in current model is torch.Size([4096, 192]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

I think this error is created when the pytorch state dicts are fused with the following line.

total_model = {**text_model, **lilt_model}

The lilt_model dim changes the incoming bigbird dim.

Would it be problematic to switch this:

total_model = {**lilt_model, **text_model }


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants