We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
allow for sigmoid attention
small tweak
make l2 distance attention work with flash attention
offer l2 distance attention for starters and cite
make sure xl and nar can handle mixture of softmax
add sigsoftmax for good measure
add option to use mixture of softmax in TransformerWrapper + some cle… …anup