Skip to content

Commit

Permalink
Create. 第四章——多头注意力机制——QK矩阵相乘
Browse files Browse the repository at this point in the history
  • Loading branch information
ben1234560 committed May 2, 2024
1 parent a29294c commit 6118b19
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 0 deletions.
Binary file added assets/image-20240502141958851.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified 人人都能看懂的Transformer/.DS_Store
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# 第四章——多头注意力机制——QK矩阵相乘

<img src="../assets/image-20240502141958851.png" alt="image-20240502141958851" style="zoom:50%;" />

### 前言

上一章,我们已经研究了矩阵相乘以及QK相乘的过程,接下来我们完整的走一遍多头注意力机制里的流程。





### QK矩阵相乘

上面我们计算好了QK相乘后的矩阵,我们看下原文中的Attention公式
$$
\operatorname{Attention}(Q, K, V)=\operatorname{softmax}\left(\frac{Q K^{T}}{\sqrt{d_{k}}}\right) V
$$
<img src="../assets/image-20240502140356134.png" alt="image-20240502140356134" style="zoom:50%;" />

我们单独拿1个批次的第一个头出来

![image-20240502140715615](../assets/image-20240502140715615.png)

第一行的所有数据,分别上`LL`分别跟`LLM with me.郭同学热爱AI喜欢游戏`每个词的相关性。第二行则是`M`分别跟`LLM with me.郭同学热爱AI喜欢游戏`每个词的相关性。越高则代表两个字的相关性越高,越低则代表两个字的相关性越低。

<img src="../assets/image-20240502141342857.png" alt="image-20240502141342857" style="zoom:50%;" />

0 comments on commit 6118b19

Please sign in to comment.