Gram matrix for non-masked-out input activations only #9

dptam · 2024-02-15T02:02:02Z

Thank you for the releasing the code and easy-to-understand Jupyter Notebook. I really liked RegMean!

If I understand correctly in the notebook (https://github.com/bloomberg/dataless-model-merging/blob/main/regmean_demo.ipynb) the gram matrix is computed on all the input activations in the code below. I thought it should only be computed on the input activations which are not masked out by the attention mask?

def get_gram(name):
        def hook(module, input, output):
            x = input[0].detach() # $[b,t,h]
            x = x.view(-1, x.size(-1))
            xtx = torch.matmul(x.transpose(0,1), x) # [h,h]
            if name not in grams:
                grams[name] = xtx / x.size(0)
                xn[name] = x.size(0)
            else:
                grams[name] = (grams[name] * xn[name] + xtx) / (x.size(0) + xn[name])
                xn[name] += x.size(0)
        return hook

The text was updated successfully, but these errors were encountered:

riyajatar37003 · 2024-06-20T11:59:15Z

i am getting this error during runing that demo. notebook
ValueError: Error initializing torch.distributed using env:https:// rendezvous: environment variable RANK expected, but not set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gram matrix for non-masked-out input activations only #9

Gram matrix for non-masked-out input activations only #9

dptam commented Feb 15, 2024

riyajatar37003 commented Jun 20, 2024

Gram matrix for non-masked-out input activations only #9

Gram matrix for non-masked-out input activations only #9

Comments

dptam commented Feb 15, 2024

riyajatar37003 commented Jun 20, 2024