Stars
An (unofficial) implementation of Focal Loss, as described in the RetinaNet paper, generalized to the multi-class case.
Open source annotation tool for machine learning practitioners.
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Generative Agents: Interactive Simulacra of Human Behavior
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
NTK scaled version of ALiBi position encoding in Transformer.
A high-throughput and memory-efficient inference and serving engine for LLMs
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Ongoing research training transformer models at scale
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
A natural language interface for computers
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行