-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I believe there is an implementation error in SAN #4
Comments
Hello! Thanks for the issue/report, nice catch. I'd be grateful if you opened a PR with the change - the left-out softmax was a remnant of refactoring that made the source a bit more user friendly (and it appears it's starting to pay off!) |
Hello, I think the 'self.multi_head[k]' is the k-th head attention which is Linear Layer with weight and bias: |
Great, thanks. Indeed, to be aligned with the paper, the activation is required. Surprisingly, current version (multilinear blocks basically) also seems to work, might be worth further exploration at some point. |
The suggested change was merged to master. |
Hello, I believe that this line is an error:
san/san/__init__.py
Line 73 in bb9aaea
I think it should be:
attended_matrix = self.softmax(self.multi_head[k](input_space)) * input_space
as shown in the paper:
The text was updated successfully, but these errors were encountered: