Skip to content

catid/lllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WORK IN PROGRESS (do not use)

Latent Large Language Models

A work in progress.

(1) Node Setup

For a development setup with fast iterative deployment on LAN, follow the instruction from the playbooks/ directory.

For Internet scale training, we will need to build a Docker container...

(2) Dataset Setup

Follow the instructions in the dataset/ directory.

(3) Training

conda create -n lllm python=3.10 -y
conda activate lllm
pip install "numpy<2.0"
pip install packaging torch==2.3.1 torchvision torchaudio
pip install mamba_ssm
pip install causal-conv1d
pip install flash-attn
pip install -r requirements.txt
#pip install cupy (only for 1-bit LAMB)

Follow the instructions in the train/ directory.

TODO

Training TODO:

Model TODO:

  • Add ALiBi/RoPE positional encoding
  • SparseK KV cache compression for SWA: https://arxiv.org/abs/2406.16747 Modify FA2 to provide vertical token scores. Top-M, M = K*2 and then a learned projection Top-K to pick the tokens to keep from each window.

Dataloader TODO:

Dataloader future improvements:

  • RHO-loss for the dataset using LLaMA-3 8B to provide reference loss for each token - need to convert to our tokenizer via approximation

Training future experiments:

FFN experiments:

  • Sharing FFN weights onion-style https://arxiv.org/abs/2104.06022
  • Share the majority of FFN weights between consecutive layers but only replace a few of them each time

Future model experiments:

Fine-tuning ideas:

  • Take LLaMA-3 70B Instruct-tuned output from each data chunk, and train the model to generate the same continuations (a way to skip fine-tuning?)

Onion training:

  1. Start with a very small model that is: nn.Embed -> SambaBlock1 -> Quantized1 (8-bit) -> SambaBlock1 -> heads. nn.Embed is taken from a pre-trained large model and is frozen. SambaBlock1 blocks have shared parameters. There is a FFN head that reproduces the input token ids with reconstruction loss. There is a second FFN head that predicts the next token with cross-entropy loss. And a third head that predicts the following token. (2) Train the model so that loss = reconstruction + next_token + second_next_token until it converges. (3) Freeze SambaBlock1. Insert a new SambaBlock2: nn.Embed -> SambaBlock1 -> Quantized1 -> SambaBlock2 -> Quantized2 -> SambaBlock2 -> Quantized1 -> SambaBlock1 -> heads (4) Continue training until convergence. (5) Repeat with a third block, etc. The Quantized layer involves kind of an auto-encoder thing that you split in half when inserting more blocks in between

About

Latent Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published