Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support Cerebras BTLM #427

Open
andersonbcdefg opened this issue Aug 2, 2023 · 5 comments
Open

Feature Request: Support Cerebras BTLM #427

andersonbcdefg opened this issue Aug 2, 2023 · 5 comments

Comments

@andersonbcdefg
Copy link

BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good to run locally. Doesn't quite fit any of the existing architectures because it's based on CerebrasGPT but also uses ALiBi. Blog here: https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/

HuggingFace model here: https://huggingface.co/cerebras/btlm-3b-8k-base

@bornjre
Copy link

bornjre commented Aug 5, 2023

i am trying to give it a go. i never ported any models before, so its new for me. but so far it looks fun. i have model conversion working i think HF refpo. (mostly bashed on convert-cerebras-to-ggml.py)
i have couple questions

  • i do not find anything about .SCB layers. What are those ?
  • for alibi is mpt-7 good reference?

It would be nice if someone experienced told me in high level what is next.

transformer.h.0.attn.c_attn.weight (7680, 2560) float16
transformer.h.0.attn.c_attn.bias  (7680,) float32
transformer.h.0.attn.c_attn.SCB  (7680,) float32

MODEL

BTLMLMHeadModel(
  (transformer): BTLMModel(
    (wte): Embedding(50257, 2560)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-31): 32 x BTLMBlock(
        (ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (attn): BTLMAttention(
          (c_attn): Linear8bitLt(in_features=2560, out_features=7680, bias=True)
          (c_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (mlp): BTLMMLP(
          (c_fc): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_fc2): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_proj): Linear8bitLt(in_features=6826, out_features=2560, bias=True)
          (act): SwiGLUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
    (relative_pe): AlibiPositionEmbeddingLayer()
  )
  (lm_head): Linear(in_features=2560, out_features=50257, bias=False)
)

model loading cpp/wip impl file
https://huggingface.co/bornjre/btlm-3b-ggml/blob/main/btlm_model_wip.cpp

@bornjre
Copy link

bornjre commented Aug 5, 2023

Sorry for ping 😃 @iboB @ggerganov

@ggerganov
Copy link
Owner

I'm not familiar with "SCB" tensors - you have to check how they are used in Python and understand their purpose

@rskuzma
Copy link

rskuzma commented Aug 14, 2023

@bornjre, I think SCB tensors come from bitsandbytes (https://huggingface.co/blog/hf-bitsandbytes-integration, https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py), perhaps as a result of using load_in_8bit=True when loading the model in HF transformers? I don't think this is part of the original model

@xloem
Copy link

xloem commented Sep 26, 2023

The python implementation of this model can be found at https://huggingface.co/cerebras/btlm-3b-8k-base/blob/main/modeling_btlm.py .

The SCB tensors are a result of huggingface-side quantization and would be converted as per any bitsandbytes quantized model, and can be ignored.

You can see the SCB tensors are not present in the model here:

$ curl -sL https://huggingface.co/cerebras/btlm-3b-8k-base/resolve/main/pytorch_model.bin | strings | grep 'transformer.h.0.attn'
transformer.h.0.attn.c_attn.weightq
transformer.h.0.attn.c_attn.biasq&h
transformer.h.0.attn.c_proj.weightq.h
transformer.h.0.attn.c_proj.biasq6h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants