Add Mamba Architecture #1157

haileyschoelkopf · 2024-02-26T14:42:27Z

This PR adds Mamba to NeoX, along with flags for turning on/off the selective scan + conv1d + full mamba_inner_fn kernels.

For now, does not support parallelism, but want to investigate adding Tensor Parallel to this.

…ored in fp32

haileyschoelkopf · 2024-02-29T13:49:32Z

This seems to train well without parallelism, but am having bugs in a conversion script I wrote (gibberish output). I'll be checking for differences in output between a single instantiated layer for this versus the mamba_ssm module to try to diagnose further

haileyschoelkopf · 2024-03-04T15:31:06Z

Worked after state-spaces/mamba#211 !

Got performance from a 160m (trained with Pythia config, untied embed + unembed) on-par with the Mamba-130m results in paper!

Will cleanup code slightly and add sample configs, then mark ready for review. This also pairs with a DeeperSpeed PR I'll make that should allow for holding specified parameters in fp32 despite Deepspeed trying to cast everything to 16 bit.

I want to check out adding tensor parallelism for Mamba too, but will do that later.

Quentin-Anthony · 2024-03-04T16:47:36Z

Worked after state-spaces/mamba#211 !

Got performance from a 160m (trained with Pythia config, untied embed + unembed) on-par with the Mamba-130m results in paper!

Will cleanup code slightly and add sample configs, then mark ready for review. This also pairs with a DeeperSpeed PR I'll make that should allow for holding specified parameters in fp32 despite Deepspeed trying to cast everything to 16 bit.

I want to check out adding tensor parallelism for Mamba too, but will do that later.

Awesome! Great work. I have some TP ideas that we can discuss on discord.

haileyschoelkopf · 2024-03-07T14:50:36Z

Ready for initial review!

Pairs with EleutherAI/DeeperSpeed#61 , which I'd appreciate feedback on if the approach there is acceptable.

megatron/model/mamba/mamba.py

…ately

configs/neox_arguments.md

megatron/model/norms.py

Quentin-Anthony · 2024-03-10T17:50:19Z

Note for the future: In addition to the attention_config, we should add a block_config where the user can choose on a per-block basis where to place individual blocks for MLP(and its variants like MoE)/Mamba/Attention(and its variants). The attention_config would then just allow the user to choose between attention variants for any attention blocks in the broader block_config. A similar mlp_config could be useful.

E.g.

"block_config": ["mamba", "attention", "mamba", "attention", "mlp"],
"attention_config": [[["flash"], 2]],
"mlp_config": [[["moe"], 1],

One drawback from this strategy is that we're adding a lot of annoying book-keeping for the user.

It's already a bit confusing that we're putting the attention-free mamba block under attention_config, but I think it's fine for this PR.

Quentin-Anthony · 2024-03-10T17:50:42Z

In the future, we should add support for the triton RMSNorm kernel introduced by mamba. Noting here and adding a TODO for later.

https://github.com/state-spaces/mamba/blob/v1.2.0/mamba_ssm/ops/triton/layernorm.py

…nto mamba-neox

haileyschoelkopf added 2 commits February 25, 2024 16:02

initial mamba support (no kernels, no parallelism)

0c0f737

Mamba runs! Also, add flags for sel. scan and conv1d fused kernels

91c90d7

haileyschoelkopf requested a review from Quentin-Anthony as a code owner February 26, 2024 14:42

Update NeoXArgs docs automatically

75b82c8

haileyschoelkopf marked this pull request as draft February 26, 2024 14:56

haileyschoelkopf and others added 6 commits February 26, 2024 19:41

add mamba_inner_fn ; try really hard to make A_log and D no-WD and st…

8ca122a

…ored in fp32

cleanup print statements

8566204

Merge branch 'main' into mamba-neox

66aee43

Update NeoXArgs docs automatically

bb5469b

Merge branch 'main' into mamba-neox

478ea46

Update NeoXArgs docs automatically

caac5b6

add draft conversion script (tested working TP=1)

494e5c9

Quentin-Anthony and others added 2 commits March 4, 2024 11:46

Merge branch 'main' into mamba-neox

5bc9399

Update NeoXArgs docs automatically

15ea72a

Quentin-Anthony and others added 13 commits March 6, 2024 13:01

Merge branch 'main' into mamba-neox

a6b108a

Update NeoXArgs docs automatically

3b79654

Merge branch 'main' into mamba-neox

680b302

Update NeoXArgs docs automatically

b8a31c0

update parallelism checks for mamba--partition activations works

37e83ec

add mamba requirements

516c7a0

clean up and better comment mamba code

2383f18

clean up and better comment mamba code

224813f

update arg validation in mamba

14007f9

more cleanup

699872c

add flag for fp32 Alog/D, add init_methods support for mamba

0e83d4f

Update NeoXArgs docs automatically

0cbbd68

update conversion script name, add docstring

bb2fe19

haileyschoelkopf and others added 4 commits March 7, 2024 14:40

name conversion script

4366d41

Update NeoXArgs docs automatically

e1809fd

add demo configs

403598a

Update NeoXArgs docs automatically

a446c36

haileyschoelkopf marked this pull request as ready for review March 7, 2024 14:49

Quentin-Anthony reviewed Mar 7, 2024

View reviewed changes

megatron/model/mamba/mamba.py Show resolved Hide resolved

Quentin-Anthony and others added 6 commits March 8, 2024 14:50

Merge branch 'main' into mamba-neox

16ad157

Update NeoXArgs docs automatically

d2c2c0f

add arguments to control conv and (in,out)_proj biases in mamba separ…

29f0280

…ately

Update NeoXArgs docs automatically

ce0a60b

make x_proj bias also controlled by flag

5ad5257

Update NeoXArgs docs automatically

a36bae9

Quentin-Anthony reviewed Mar 10, 2024

View reviewed changes

configs/neox_arguments.md Show resolved Hide resolved

Quentin-Anthony reviewed Mar 10, 2024

View reviewed changes

megatron/model/norms.py Show resolved Hide resolved

Quentin-Anthony and others added 2 commits March 10, 2024 13:46

pre-commit, add comments

39df09f

Update NeoXArgs docs automatically

b9db215

Quentin-Anthony and others added 3 commits March 10, 2024 13:55

Add mamba import print

093e7ac

Merge branch 'mamba-neox' of https://github.com/EleutherAI/gpt-neox i…

33f2842

…nto mamba-neox

Update NeoXArgs docs automatically

870447f

Quentin-Anthony approved these changes Mar 10, 2024

View reviewed changes

Quentin-Anthony merged commit 6809bbc into main Mar 10, 2024
2 checks passed

Quentin-Anthony deleted the mamba-neox branch March 10, 2024 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mamba Architecture #1157

Add Mamba Architecture #1157

haileyschoelkopf commented Feb 26, 2024 •

edited

Loading

haileyschoelkopf commented Feb 29, 2024

haileyschoelkopf commented Mar 4, 2024

Quentin-Anthony commented Mar 4, 2024

haileyschoelkopf commented Mar 7, 2024

Quentin-Anthony commented Mar 10, 2024

Quentin-Anthony commented Mar 10, 2024

Add Mamba Architecture #1157

Add Mamba Architecture #1157

Conversation

haileyschoelkopf commented Feb 26, 2024 • edited Loading

haileyschoelkopf commented Feb 29, 2024

haileyschoelkopf commented Mar 4, 2024

Quentin-Anthony commented Mar 4, 2024

haileyschoelkopf commented Mar 7, 2024

Quentin-Anthony commented Mar 10, 2024

Quentin-Anthony commented Mar 10, 2024

haileyschoelkopf commented Feb 26, 2024 •

edited

Loading