- Beijing
-
16:39
(UTC +08:00) - https://scholar.google.com/citations?user=CSpaTt8AAAAJ
Lists (28)
Sort Name ascending (A-Z)
3d-photo
AIGC-2D
AIGC-3D
ALgorithm
BigModel
C++Dev
Deepfake
diffusion models
Games
image generate
inpainting
LearningBook
LLM
Mesh reconstruction
Mesh tools
model-compression-acceleration
Model-Inference
Nerf
Optical flow
PBR
Segmentation
SR
Stereo Matching
Style transfer
Texture process
Tools
transformer
tutorials
Starred repositories
The official GitHub page for the survey paper "A Survey of Large Language Models".
Ongoing research training transformer models at scale
niyunsheng / EMS-SD
Forked from NVIDIA/FasterTransformerarxiv preprint: https://arxiv.org/abs/2405.07542
Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
scalable and robust tree-based speculative decoding algorithm
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
General technology for enabling AI capabilities w/ LLMs and MLLMs
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
REST: Retrieval-Based Speculative Decoding, NAACL 2024
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Official release of InternLM2.5 base and chat models. 1M context support
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Fast and memory-efficient exact attention
An easy to use PyTorch to TensorRT converter
Large Language Model Text Generation Inference