Starred repositories
RPC framework based on C++ Workflow. Supports SRPC, Baidu bRPC, Tencent tRPC, thrift protocols.
🎉CUDA/C++ 笔记 / 技术博客: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
A high-throughput and memory-efficient inference and serving engine for LLMs
llama3 implementation one matrix multiplication at a time
Deep Reinforcement Learning: Zero to Hero!
Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Open-Sora: Democratizing Efficient Video Production for All
lightweight, standalone C++ inference engine for Google's Gemma models.
一个还算强大的Web思维导图。A relatively powerful web mind map.
FinSight - Financial Insights at Your Fingertip: FinSight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining c…
A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
A library of algorithms for approximate nearest neighbor search in high dimensions, along with a set of useful tools for designing such algorithms.
A Python library transfers PyTorch tensors between CPU and NVMe
Distribute and run LLMs with a single file.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English versi…
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
A minimal programming example for a chat server
RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,无须安装任何第三方agent库。
A comprehensive deep dive into the world of tokens