Highlights
- Pro
Stars
Ring attention implementation with flash attention
FlashInfer: Kernel Library for LLM Serving
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
neuralmagic / nm-vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory
Supercharge Your LLM Application Evaluations 🚀
A high-throughput and memory-efficient inference and serving engine for LLMs
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
[ECIR'24] Implementation of "Large Language Models are Zero-Shot Rankers for Recommender Systems"
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
A PHP based search engine that filters results from various sources like Google, Yahoo and Bing, based on the relevance of web pages with searched keywords. To evaluate the results various technics…
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
🌟 Chrome extension that enables users to chat with ChatGPT by opening a sidebar on any website
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Training and serving large-scale neural networks with auto parallelization.
Source Code of Paper "GPTScore: Evaluate as You Desire"
Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation