Skip to content

A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks

Notifications You must be signed in to change notification settings

Strivin0311/long-llms-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

long-llms-learning

survey

A repository sharing the panorama of the methodology literature on Transformer architecture upgrades in Large Language Models for handling extensive context windows, with real-time updating the newest published works.

Overview

Survey

For a clear taxonomy and more insights about the methodology, you can refer to our survey: Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey with a overview shown below

Overview of the survey

Flash-ReRoPE

We have augmented the great work rerope by Su with flash-attn kernel to combine rerope's infinite postional extrapolation capability with flash-attn's efficience, named as flash-rerope.

You can find and use the implementation as a flash-attn-like interface function here, with a simple precision and flops test script here.

Or you can further see how to implement llama attention module with flash-rerope here.

Latest News

Latest Works

Latest Baselines

Latest Benchmarks

More to Learn

Long-LLMs-Evals

  • We've also released a building repo long-llms-evals as a pipeline to evaluate various methods designed for general / specific LLMs to enhance their long-context capabilities on well-known long-context benchmarks.

LLMs-Learning

  • This repo is also a sub-track for another repo llms-learning, where you can learn more technologies and applicated tasks about the full-stack of Large Language Models.

Table of Contents

Contribution

If you want to make contribution to this repo, you can just make a pr / email us with the link to the paper(s) or use the format as below:

  • (un)read paper format:
#### <paper title> [(UN)READ]

paper link: [here](<link address>)

xxx link: [here](<link address>)

citation:
<bibtex citation>

Citation

If you find the survey or this repo helpful in your research or work, you can cite our paper as below:

@misc{huang2024advancing,
      title={Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey}, 
      author={Yunpeng Huang and Jingwei Xu and Junyu Lai and Zixu Jiang and Taolue Chen and Zenan Li and Yuan Yao and Xiaoxing Ma and Lijuan Yang and Hao Chen and Shupeng Li and Penghao Zhao},
      year={2024},
      eprint={2311.12351},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •