Pre-training code? #30

logan-markewich · 2022-12-13T15:58:58Z

Are you able to provide the pre-training code?

I would like to try and pre-train using roberta-large, or a similar language model :)

jordanparker6 · 2023-02-11T20:39:43Z

I would like to do the same but with bigbird-roberta-en-base if possible...

logan-markewich · 2023-02-12T18:12:16Z

If the hidden size is the same as roberta-base, you can probably use the weight generation script in the repo

MaveriQ · 2023-03-15T10:43:34Z

@logan-markewich, @jordanparker6 I am coding up the collator for the masking in the three pretraining strategies. Maybe we can work together, and share it here afterwards for everyone else to use?

jordanparker6 · 2023-03-17T16:11:19Z

@logan-markewich, @jordanparker6 I am coding up the collator for the masking in the three pretraining strategies. Maybe we can work together, and share it here afterwards for everyone else to use?

Happy to help out as needed.

jordanparker6 · 2023-03-17T16:15:31Z

If the hidden size is the same as roberta-base, you can probably use the weight generation script in the repo

I don't think it is... I posted my error message on a seperate issue.

I was able to use the provided script to create a lilt-roberta-base-en using the following: https://huggingface.co/google/bigbird-roberta-base. If I can get this working, I will post up to HuggingfaceHub.

BigBird uses the same tokenizer as roberta so no issue with tokenizationgoogle/bigbird-roberta-base.

However, the following error occurs when loading the model.

RuntimeError: Error(s) in loading state_dict for LiltForTokenClassification:
	size mismatch for lilt.layout_embeddings.box_position_embeddings.weight: copying a param with shape torch.Size([514, 192]) from checkpoint, the shape in current model is torch.Size([4096, 192]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

I think this error is created when the pytorch state dicts are fused with the following line.

total_model = {**text_model, **lilt_model}

The lilt_model dim changes the incoming bigbird dim.

Would it be problematic to switch this:

total_model = {**lilt_model, **text_model }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-training code? #30

Pre-training code? #30

logan-markewich commented Dec 13, 2022

jordanparker6 commented Feb 11, 2023

logan-markewich commented Feb 12, 2023

MaveriQ commented Mar 15, 2023

jordanparker6 commented Mar 17, 2023

jordanparker6 commented Mar 17, 2023

Pre-training code? #30

Pre-training code? #30

Comments

logan-markewich commented Dec 13, 2022

jordanparker6 commented Feb 11, 2023

logan-markewich commented Feb 12, 2023

MaveriQ commented Mar 15, 2023

jordanparker6 commented Mar 17, 2023

jordanparker6 commented Mar 17, 2023