Stars
A simple C++11 Thread Pool implementation
Reference implementations of MLPerf™ inference benchmarks
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
FSA/FST algorithms, differentiable, with PyTorch compatibility.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A high-throughput and memory-efficient inference and serving engine for LLMs
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Seamless operability between C++11 and Python
This is a code repository for pytorch c++ (or libtorch) tutorial.
Keyword spotting on Arm Cortex-M Microcontrollers
Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769
Transformer related optimization, including BERT, GPT
Nuclei Microcontroller Software Interface Standard Development Repo
Robust Speech Recognition via Large-Scale Weak Supervision
Flops counter for convolutional networks in pytorch framework