-
Notifications
You must be signed in to change notification settings - Fork 436
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a29294c
commit 6118b19
Showing
3 changed files
with
28 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# 第四章——多头注意力机制——QK矩阵相乘 | ||
|
||
<img src="../assets/image-20240502141958851.png" alt="image-20240502141958851" style="zoom:50%;" /> | ||
|
||
### 前言 | ||
|
||
上一章,我们已经研究了矩阵相乘以及QK相乘的过程,接下来我们完整的走一遍多头注意力机制里的流程。 | ||
|
||
|
||
|
||
|
||
|
||
### QK矩阵相乘 | ||
|
||
上面我们计算好了QK相乘后的矩阵,我们看下原文中的Attention公式 | ||
$$ | ||
\operatorname{Attention}(Q, K, V)=\operatorname{softmax}\left(\frac{Q K^{T}}{\sqrt{d_{k}}}\right) V | ||
$$ | ||
<img src="../assets/image-20240502140356134.png" alt="image-20240502140356134" style="zoom:50%;" /> | ||
|
||
我们单独拿1个批次的第一个头出来 | ||
|
||
![image-20240502140715615](../assets/image-20240502140715615.png) | ||
|
||
第一行的所有数据,分别上`LL`分别跟`LLM with me.郭同学热爱AI喜欢游戏`每个词的相关性。第二行则是`M`分别跟`LLM with me.郭同学热爱AI喜欢游戏`每个词的相关性。越高则代表两个字的相关性越高,越低则代表两个字的相关性越低。 | ||
|
||
<img src="../assets/image-20240502141342857.png" alt="image-20240502141342857" style="zoom:50%;" /> | ||
|