📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
Aug 12, 2024
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information retrieval.
🚀 DeepSeek-V2大模型逆向API白嫖测试【特长:GPT4平替】,支持高速流式输出、多轮对话,零配置部署,多路token支持。
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
BigCodeBench: The Next Generation of HumanEval
基于NineAi2.4.2的二开版本,含编译包和整合包,无需授权,全套源码,开箱即用,喜欢的给个Star~
Chatbot-GPT, powered by OpenIM’s webhooks, seamlessly integrates with various messaging platforms. This tool enables private and group chats with bots, enhancing interactive communication. It delivers quick, automated responses, ideal for optimizing customer service and dynamic discussions, meeting diverse communication needs.
This repository provides an unofficial, reverse-engineered API for DeepSeek Chat & Coder (v2), allowing free and unlimited access to its powerful features.
💭 一个可二次开发 Chat Bot 对话 Web 端 MVP 原型模板, 基于 Vue3、Vite 5、TypeScript、Naive UI 、UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it 预览, 💼 易于定制和快速搭建 Chat 类大语言模型产品 (附示例截图)
A LLM RAG application based on LlamaIndex and Streamlit. Optimized for Chinese users by adopting BAAI embedding / reranker models, Ollama local models and using LLM API from Chinese LLM service providers like Zhipu, DeepSeek and Moonshot.
Add a description, image, and links to the deepseek topic page so that developers can more easily learn about it.
To associate your repository with the deepseek topic, visit your repo's landing page and select "manage topics."