Skip to content
View JingChunzhen's full-sized avatar
  • Beijing

Block or report JingChunzhen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

RPC framework based on C++ Workflow. Supports SRPC, Baidu bRPC, Tencent tRPC, thrift protocols.

C++ 1,933 382 Updated Sep 6, 2024

🎉CUDA/C++ 笔记 / 技术博客: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 1,108 109 Updated Sep 4, 2024

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization

JavaScript 2,452 204 Updated Sep 5, 2024

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 624 107 Updated Aug 28, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 26,038 3,814 Updated Sep 6, 2024

Low-bit LLM inference on CPU with lookup table

C++ 401 32 Updated Aug 30, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 12,982 1,036 Updated May 23, 2024

Deep Reinforcement Learning: Zero to Hero!

Jupyter Notebook 1,994 70 Updated Aug 18, 2024

Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024

C++ 53 2 Updated Aug 29, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,848 402 Updated Sep 6, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,537 2,064 Updated Aug 9, 2024

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 5,899 499 Updated Sep 4, 2024

一个还算强大的Web思维导图。A relatively powerful web mind map.

JavaScript 5,489 805 Updated Sep 6, 2024

FinSight - Financial Insights at Your Fingertip: FinSight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining c…

Jupyter Notebook 194 76 Updated Jul 10, 2024

利用LLM构建应用实践笔记

Python 597 39 Updated Apr 12, 2024

A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).

C 601 21 Updated Sep 5, 2024

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

Python 378 5 Updated Mar 15, 2024

A library of algorithms for approximate nearest neighbor search in high dimensions, along with a set of useful tools for designing such algorithms.

C++ 98 22 Updated Sep 5, 2024

A Python library transfers PyTorch tensors between CPU and NVMe

C++ 91 18 Updated Apr 27, 2023

《独立开发者的艺术》打造最全的独立开发者指南,一人公司。

1,249 102 Updated Jul 24, 2024

Distribute and run LLMs with a single file.

C++ 18,860 953 Updated Aug 31, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,487 503 Updated Sep 5, 2024

Wang Yi's GPT solution

Cuda 137 7 Updated Dec 17, 2023

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

13,091 1,360 Updated Feb 13, 2023

ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English versi…

12,946 1,162 Updated Jul 12, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,409 159 Updated Sep 5, 2024

A minimal programming example for a chat server

C 7,264 806 Updated Jan 27, 2024

RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,无须安装任何第三方agent库。

Python 559 101 Updated Sep 6, 2024

A comprehensive deep dive into the world of tokens

Python 211 8 Updated Jun 24, 2024
Next