Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf : add Mamba keys and tensors #763

Merged
merged 1 commit into from
Mar 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions docs/gguf.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,7 @@ By convention, most counts/lengths/etc are `uint64` unless otherwise specified.
- `gpt2`
- `bloom`
- `falcon`
- `mamba`
- `rwkv`
- **`general.quantization_version: uint32`**: The version of the quantization format. Not required if the model is not quantized (i.e. no tensors are quantized). If any tensors are quantized, this _must_ be present. This is separate to the quantization scheme of the tensors itself; the quantization version may change without changing the scheme's name (e.g. the quantization scheme is Q5_K, and the quantization version is 4).
- **`general.alignment: uint32`**: the global alignment to use, as described above. This can vary to allow for different alignment schemes, but it must be a multiple of 8. Some writers may not write the alignment. If the alignment is **not** specified, assume it is `32`.
Expand Down Expand Up @@ -319,6 +320,13 @@ Note that older models may not have these keys, and may instead use the followin

It is recommended that models use the newer keys if possible, as they are more flexible and allow for more complex scaling schemes. Executors will need to support both indefinitely.

#### SSM

- `[llm].ssm.conv_kernel: uint32`: The size of the rolling/shift state.
- `[llm].ssm.inner_size: uint32`: The embedding size of the states.
- `[llm].ssm.state_size: uint32`: The size of the recurrent state.
- `[llm].ssm.time_step_rank: uint32`: The rank of time steps.

#### Models

The following sections describe the metadata for each model architecture. Each key specified _must_ be present.
Expand Down Expand Up @@ -438,6 +446,17 @@ The following sections describe the metadata for each model architecture. Each k
model[src] = torch.cat((q,k,v)).reshape_as(model[src])
```

##### Mamba

- `mamba.context_length`
- `mamba.embedding_length`
- `mamba.block_count`
- `mamba.ssm.conv_kernel`
- `mamba.ssm.inner_size`
- `mamba.ssm.state_size`
- `mamba.ssm.time_step_rank`
- `mamba.attention.layer_norm_rms_epsilon`

##### RWKV

The vocabulary size is the same as the number of rows in the `head` matrix.
Expand Down Expand Up @@ -564,6 +583,14 @@ where N signifies the block number a layer belongs to, and where `BB` could be:
- `ffn_down_exp`: Feed-forward network "down" layer per expert in MoE models
- `ffn_up_exp`: Feed-forward network "up" layer per expert in MoE models

- `ssm_in`: State space model input projections layer
- `ssm_conv1d`: State space model rolling/shift layer
- `ssm_x`: State space model selective parametrization layer
- `ssm_a`: State space model state compression layer
- `ssm_d`: State space model skip connection layer
- `ssm_dt`: State space model time step layer
- `ssm_out`: State space model output projection layer

## Version History

This document is actively updated to describe the current state of the metadata, and these changes are not tracked outside of the commits.
Expand Down
Loading