Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q和K的计算方式不是哈达玛乘积 #4

Closed
gaosheng0527 opened this issue Jan 2, 2024 · 3 comments
Closed

Q和K的计算方式不是哈达玛乘积 #4

gaosheng0527 opened this issue Jan 2, 2024 · 3 comments

Comments

@gaosheng0527
Copy link

在selfattention_family.py中的SegmentCorrelation中,correlation_scores = torch.einsum("bmlhd,bnlhd->bhmn", seg_queries, seg_keys)本质上不还是点积运算吗

@ddz16
Copy link
Owner

ddz16 commented Jan 2, 2024

注意m和n的定义。m和n分别为query segment和key segment的数量,每个segment都由一个矩阵来表示。比如说有5个query segment,有3个key segment,则每一对segment(也就是每一对矩阵)之间的相关性为哈达玛乘积。

如果还不清楚,可以回忆下标准self-attention中,每个query和每个key都是一个向量,所以计算每一对向量之间相关性时用的是向量点积,有m个query和n个key时,就得到了一个m×n的相关矩阵A,如果同时计算所有query和key的相关性。而在Preformer中,每个query和每个key都是一个矩阵,计算每一对矩阵的相关性时用的是哈达玛乘积,有m个query和n个key时,同样也是得到了一个m×n的相关矩阵A。

@wwwait0527
Copy link

wwwait0527 commented Jan 2, 2024

感谢您的回复!再次看了一下代码,相当于是将序列分割成了不同段长的子矩阵做哈达玛乘积,然后在最后一个维度上求平均吗?如果是这样的话那我理解了。

@ddz16
Copy link
Owner

ddz16 commented Jan 3, 2024

每个query segment和key segment的大小为均为l×d,哈达玛积的结果为一个矩阵,对这个矩阵整体求平均得到一个常数。所以torch.einsum("bmlhd,bnlhd->bhmn")中的输出为bhmn,没有l和d。

@ddz16 ddz16 closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants