Make Transformer layers more flexible #333

WenjieDu · 2024-04-01T16:28:41Z

1. Feature description

Make ScaledDotProductAttention in MultiHeadAttention could be replaced by other attention operators.

PyPOTS/pypots/nn/modules/transformer/attention.py

Line 138 in b5d5d1c

self.attention = ScaledDotProductAttention(attn_temperature, attn_dropout)

2. Motivation

Nowadays Transformer is applied everywhere, and we people have many variants of the original proposed self-attention. The overall main structure could remain (the multi-layer stacked encoder, the multi-head attention layer, and the feed-forward layer, etc.), while the attention operator often changes, e.g. ProbAttention from Informer. By making Transformer layers more flexible, we can construct more complex models with current modules in PyPOTS.

3. Your contribution

Will make a PR to achieve this goal.

WenjieDu added enhancement New feature or request new feature Proposing to add a new feature labels Apr 1, 2024

WenjieDu self-assigned this Apr 1, 2024

WenjieDu mentioned this issue Apr 2, 2024

Make the self-attention operator replaceable in Transformer #334

Merged

4 tasks

WenjieDu closed this as completed Apr 2, 2024

WenjieDu mentioned this issue May 5, 2024

Make MultiHeadAttention to work with all attention operators #382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Transformer layers more flexible #333

Make Transformer layers more flexible #333

WenjieDu commented Apr 1, 2024

Make Transformer layers more flexible #333

Make Transformer layers more flexible #333

Comments

WenjieDu commented Apr 1, 2024

1. Feature description

2. Motivation

3. Your contribution