Skip to content

catid/lllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WORK IN PROGRESS (do not use)

Latent Large Language Models

A work in progress.

(1) Node Setup

For a development setup with fast iterative deployment on LAN, follow the instruction from the playbooks/ directory.

For Internet scale training, we will need to build a Docker container...

(2) Dataset Setup

Follow the instructions in the dataset/ directory.

(3) Training

Follow the instructions in the train/ directory.

Ideas

Dataloader TODO:

  • Add support for returning the list of concatenated samples in flash_attn format

Dataloader future improvements:

  • RHO-loss for the dataset using LLaMA-3 8B to provide reference loss for each token - need to convert to our tokenizer via approximation

Training future experiments:

  • Meta-learning to try to estimate weight updates using larger batch sizes and more iterations from smaller batch sizes and single steps
  • Try optimi https://optimi.benjaminwarner.dev/
  • AdaLOMO

FFN experiments:

  • Sharing FFN weights onion-style https://arxiv.org/abs/2104.06022
  • Share the majority of FFN weights between consecutive layers but only replace a few of them each time

Future model experiments:

Fine-tuning ideas:

  • Take LLaMA-3 70B Instruct-tuned output from each data chunk, and train the model to generate the same continuations (a way to skip fine-tuning?)

About

Latent Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published