Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I believe there is an implementation error in SAN #4

Closed
waystogetthere opened this issue Dec 28, 2022 · 4 comments
Closed

I believe there is an implementation error in SAN #4

waystogetthere opened this issue Dec 28, 2022 · 4 comments

Comments

@waystogetthere
Copy link
Contributor

Hello, I believe that this line is an error:

attended_matrix = self.multi_head[k](input_space) * input_space

I think it should be:

attended_matrix = self.softmax(self.multi_head[k](input_space)) * input_space

as shown in the paper:

image

@SkBlaz
Copy link
Owner

SkBlaz commented Dec 29, 2022

Hello! Thanks for the issue/report, nice catch. I'd be grateful if you opened a PR with the change - the left-out softmax was a remnant of refactoring that made the source a bit more user friendly (and it appears it's starting to pay off!)

@waystogetthere
Copy link
Contributor Author

waystogetthere commented Dec 29, 2022

Hello,
Thanks for your fast reply! Merry Xmas & Happy New year!
Yeah, I would make a pull request soon.

I think the 'self.multi_head[k]' is the k-th head attention which is Linear Layer with weight and bias: $W_{l_{att}}^k$ and $b^k_{l_{att}}$. So the softmax is needed

@SkBlaz
Copy link
Owner

SkBlaz commented Dec 30, 2022

Great, thanks. Indeed, to be aligned with the paper, the activation is required. Surprisingly, current version (multilinear blocks basically) also seems to work, might be worth further exploration at some point.

@SkBlaz
Copy link
Owner

SkBlaz commented Jan 1, 2023

The suggested change was merged to master.

@SkBlaz SkBlaz closed this as completed Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants