Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Transformer layers more flexible #333

Closed
WenjieDu opened this issue Apr 1, 2024 · 0 comments
Closed

Make Transformer layers more flexible #333

WenjieDu opened this issue Apr 1, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request new feature Proposing to add a new feature

Comments

@WenjieDu
Copy link
Owner

WenjieDu commented Apr 1, 2024

1. Feature description

Make ScaledDotProductAttention in MultiHeadAttention could be replaced by other attention operators.

self.attention = ScaledDotProductAttention(attn_temperature, attn_dropout)

2. Motivation

Nowadays Transformer is applied everywhere, and we people have many variants of the original proposed self-attention. The overall main structure could remain (the multi-layer stacked encoder, the multi-head attention layer, and the feed-forward layer, etc.), while the attention operator often changes, e.g. ProbAttention from Informer. By making Transformer layers more flexible, we can construct more complex models with current modules in PyPOTS.

3. Your contribution

Will make a PR to achieve this goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new feature Proposing to add a new feature
Projects
None yet
Development

No branches or pull requests

1 participant