BERT #62

Hasanmog · 2024-03-09T21:08:58Z

Hello ,
I introduced some parameters to the text encoder (BERT) and trained for some epochs. Everything ran smoothly.
But when evaluating using the resultant checkpoint , I'm getting this warning about BERT new params:

"Some weights of Bert Model were not initialized from model checkpoint at bert-uncased and are newly initialized"

I think it will evaluate using the vanilla bert (without these newly added params). I guess that the trained params are loaded after loading the bert model which causes this warning.

So , what should I change in the config file in order to evaluate using BERT with these introduced trained params?

Thanks in advance !

cc : @aghand0ur

selinakhan · 2024-04-16T10:45:34Z

I ran into the same issue and made a few modifications to the way I load my fine-tuned BERT model. As far as I can tell, the newly initialized weights are only for the pooler-layer. In my case, I fine-tuned a BERT model using MLM, which doesn't train the pooling layer as it's not required for the task. In turn, when I save that model it doesn't include those parameters and when I re-load it it produces the error you mentioned.

From what I understand, the GD model also does not use the pooled output and uses the BertModelWarper() simply to access the outputs from each state more easily. In the GroundingDINO module (from models/groundingdino.py) in lines 267-269 the code uses the last hidden state and trains another linear layer on top of it, ignoring the pooling layer.

I think when loading a fine-tuned BERT model, only the pooler-layer weights are newly initialized, and not the fine-tuned parameters. So in theory, I think for using GD the warning doesn't really matter. For my own sanity to ensure at least nothing is initialized randomly, I implemented this function to load my fine-tuned BERT model and re-initialize the pooler weights with the ones from a pre-trained BERT model.

def mlm2bm(text_encoder_type, model_path):
    ''' Load fine-tuned MLM model as BertModel and replace the pooling layer 
        weights with the original ones. MLM models don't require a pooling layer. '''
    
    original_model = BertModel.from_pretrained(text_encoder_type)

    print('Loaded original model')

    # Save the original pooling layer weights
    original_pooler_weight = original_model.pooler.dense.weight.clone()
    original_pooler_bias = original_model.pooler.dense.bias.clone()

    # Load fine-tuned MLM model as BertModel
    fine_tuned_model = BertModel.from_pretrained(model_path)

    # Replace the pooling layer weights with the original ones
    fine_tuned_model.pooler.dense.weight = torch.nn.Parameter(original_pooler_weight)
    fine_tuned_model.pooler.dense.bias = torch.nn.Parameter(original_pooler_bias)

    print('Replaced pooler weights!')

    return fine_tuned_model

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT #62

BERT #62

Hasanmog commented Mar 9, 2024 •

edited

Loading

selinakhan commented Apr 16, 2024

BERT #62

BERT #62

Comments

Hasanmog commented Mar 9, 2024 • edited Loading

selinakhan commented Apr 16, 2024

Hasanmog commented Mar 9, 2024 •

edited

Loading