Highlights
- Pro
Block or Report
Block or report dwzhu-pku
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Official github repo for the paper "Compression Represents Intelligence Linearly"
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".
Fast and memory-efficient exact attention
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and…
Efficient retrieval head analysis with triton flash attention that supports topK probability
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
[ACL 2024] A Prospector of Long-Dependency Data for Large Language Models
The this is the official implementation of "CAPE: Context-Adaptive Positional Encoding for Length Extrapolation"
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
Sequence Parallel Attention for Long Context LLM Model Training and Inference
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model