Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gram matrix for non-masked-out input activations only #9

Open
dptam opened this issue Feb 15, 2024 · 1 comment
Open

Gram matrix for non-masked-out input activations only #9

dptam opened this issue Feb 15, 2024 · 1 comment

Comments

@dptam
Copy link

dptam commented Feb 15, 2024

Thank you for the releasing the code and easy-to-understand Jupyter Notebook. I really liked RegMean!

If I understand correctly in the notebook (https://github.com/bloomberg/dataless-model-merging/blob/main/regmean_demo.ipynb) the gram matrix is computed on all the input activations in the code below. I thought it should only be computed on the input activations which are not masked out by the attention mask?

def get_gram(name):
        def hook(module, input, output):
            x = input[0].detach() # $[b,t,h]
            x = x.view(-1, x.size(-1))
            xtx = torch.matmul(x.transpose(0,1), x) # [h,h]
            if name not in grams:
                grams[name] = xtx / x.size(0)
                xn[name] = x.size(0)
            else:
                grams[name] = (grams[name] * xn[name] + xtx) / (x.size(0) + xn[name])
                xn[name] += x.size(0)
        return hook
@riyajatar37003
Copy link

i am getting this error during runing that demo. notebook
ValueError: Error initializing torch.distributed using env:https:// rendezvous: environment variable RANK expected, but not set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants