From 9c2adc4962a3a5d259f10db2171e0df5c83e4b05 Mon Sep 17 00:00:00 2001 From: compilade <113953597+compilade@users.noreply.github.com> Date: Wed, 13 Mar 2024 10:33:19 -0400 Subject: [PATCH] gguf : add Mamba keys and tensors (#763) --- docs/gguf.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/gguf.md b/docs/gguf.md index bb63f4f0e..ddd61a340 100644 --- a/docs/gguf.md +++ b/docs/gguf.md @@ -234,6 +234,7 @@ By convention, most counts/lengths/etc are `uint64` unless otherwise specified. - `gpt2` - `bloom` - `falcon` + - `mamba` - `rwkv` - **`general.quantization_version: uint32`**: The version of the quantization format. Not required if the model is not quantized (i.e. no tensors are quantized). If any tensors are quantized, this _must_ be present. This is separate to the quantization scheme of the tensors itself; the quantization version may change without changing the scheme's name (e.g. the quantization scheme is Q5_K, and the quantization version is 4). - **`general.alignment: uint32`**: the global alignment to use, as described above. This can vary to allow for different alignment schemes, but it must be a multiple of 8. Some writers may not write the alignment. If the alignment is **not** specified, assume it is `32`. @@ -319,6 +320,13 @@ Note that older models may not have these keys, and may instead use the followin It is recommended that models use the newer keys if possible, as they are more flexible and allow for more complex scaling schemes. Executors will need to support both indefinitely. +#### SSM + +- `[llm].ssm.conv_kernel: uint32`: The size of the rolling/shift state. +- `[llm].ssm.inner_size: uint32`: The embedding size of the states. +- `[llm].ssm.state_size: uint32`: The size of the recurrent state. +- `[llm].ssm.time_step_rank: uint32`: The rank of time steps. + #### Models The following sections describe the metadata for each model architecture. Each key specified _must_ be present. @@ -438,6 +446,17 @@ The following sections describe the metadata for each model architecture. Each k model[src] = torch.cat((q,k,v)).reshape_as(model[src]) ``` +##### Mamba + +- `mamba.context_length` +- `mamba.embedding_length` +- `mamba.block_count` +- `mamba.ssm.conv_kernel` +- `mamba.ssm.inner_size` +- `mamba.ssm.state_size` +- `mamba.ssm.time_step_rank` +- `mamba.attention.layer_norm_rms_epsilon` + ##### RWKV The vocabulary size is the same as the number of rows in the `head` matrix. @@ -564,6 +583,14 @@ where N signifies the block number a layer belongs to, and where `BB` could be: - `ffn_down_exp`: Feed-forward network "down" layer per expert in MoE models - `ffn_up_exp`: Feed-forward network "up" layer per expert in MoE models +- `ssm_in`: State space model input projections layer +- `ssm_conv1d`: State space model rolling/shift layer +- `ssm_x`: State space model selective parametrization layer +- `ssm_a`: State space model state compression layer +- `ssm_d`: State space model skip connection layer +- `ssm_dt`: State space model time step layer +- `ssm_out`: State space model output projection layer + ## Version History This document is actively updated to describe the current state of the metadata, and these changes are not tracked outside of the commits.