Skip to content

[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

Notifications You must be signed in to change notification settings

VITA-Group/LoCoCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

LoCoCo: Dropping In Convolutions for Long Context Compression

Ruisi Cai1, Yuandong Tian2, Zhangyang Wang1, Beidi Chen3,

1University of Texas at Austin, 2Meta AI (FAIR), 3Carnegie Mellon University

Usage

python train.py \
    --dataset_name togethercomputer/RedPajama-Data-1T-Sample \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --block_size 512 \
    --clean_period 8 \
    --method conv \
    --kernel_size 21 \
    --n_convlayer 1 \
    --mem_size 512 \
    --max_train_steps 1000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 128 \
    --eval_iter 20 \
    --eval_interval 50 \
    --stream_tokenizer \
    --normalizer_init 0.5 \
    --memory_lr_scale 1000 \
    --norm_lr_scale 5 \
    --rope_change \
    --checkpointing_steps 100 \
    --output_dir output/no_extend/rp_{block_size}_{clean_period}_mem{mem_size}/{method}/ \
    --auto_resume 

The model checkpoints is coming soon!

Citation

If you find this useful, please cite the following paper:

@article{cai2024lococo,
  title={LoCoCo: Dropping In Convolutions for Long Context Compression},
  author={Cai, Ruisi and Tian, Yuandong and Wang, Zhangyang and Chen, Beidi},
  journal={arXiv preprint arXiv:2406.05317},
  year={2024}
}

About

[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages