Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT #62

Open
Hasanmog opened this issue Mar 9, 2024 · 1 comment
Open

BERT #62

Hasanmog opened this issue Mar 9, 2024 · 1 comment

Comments

@Hasanmog
Copy link

Hasanmog commented Mar 9, 2024

Hello ,
I introduced some parameters to the text encoder (BERT) and trained for some epochs. Everything ran smoothly.
But when evaluating using the resultant checkpoint , I'm getting this warning about BERT new params:

"Some weights of Bert Model were not initialized from model checkpoint at bert-uncased and are newly initialized"

I think it will evaluate using the vanilla bert (without these newly added params). I guess that the trained params are loaded after loading the bert model which causes this warning.

So , what should I change in the config file in order to evaluate using BERT with these introduced trained params?

Thanks in advance !

cc : @aghand0ur

@selinakhan
Copy link

I ran into the same issue and made a few modifications to the way I load my fine-tuned BERT model. As far as I can tell, the newly initialized weights are only for the pooler-layer. In my case, I fine-tuned a BERT model using MLM, which doesn't train the pooling layer as it's not required for the task. In turn, when I save that model it doesn't include those parameters and when I re-load it it produces the error you mentioned.

From what I understand, the GD model also does not use the pooled output and uses the BertModelWarper() simply to access the outputs from each state more easily. In the GroundingDINO module (from models/groundingdino.py) in lines 267-269 the code uses the last hidden state and trains another linear layer on top of it, ignoring the pooling layer.

I think when loading a fine-tuned BERT model, only the pooler-layer weights are newly initialized, and not the fine-tuned parameters. So in theory, I think for using GD the warning doesn't really matter. For my own sanity to ensure at least nothing is initialized randomly, I implemented this function to load my fine-tuned BERT model and re-initialize the pooler weights with the ones from a pre-trained BERT model.

def mlm2bm(text_encoder_type, model_path):
    ''' Load fine-tuned MLM model as BertModel and replace the pooling layer 
        weights with the original ones. MLM models don't require a pooling layer. '''
    
    original_model = BertModel.from_pretrained(text_encoder_type)

    print('Loaded original model')

    # Save the original pooling layer weights
    original_pooler_weight = original_model.pooler.dense.weight.clone()
    original_pooler_bias = original_model.pooler.dense.bias.clone()

    # Load fine-tuned MLM model as BertModel
    fine_tuned_model = BertModel.from_pretrained(model_path)

    # Replace the pooling layer weights with the original ones
    fine_tuned_model.pooler.dense.weight = torch.nn.Parameter(original_pooler_weight)
    fine_tuned_model.pooler.dense.bias = torch.nn.Parameter(original_pooler_bias)

    print('Replaced pooler weights!')

    return fine_tuned_model

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants