Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
-
Updated
Jul 13, 2024 - Python
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
List of papers related to neural network quantization in recent AI conferences and journals.
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
(CVPR 2021, Oral) Dynamic Slimmable Network
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
Explorations into some recent techniques surrounding speculative decoding
Deep Face Model Compression
On-device LLM Inference Powered by X-Bit Quantization
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Soft Threshold Weight Reparameterization for Learnable Sparsity
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Add a description, image, and links to the efficient-inference topic page so that developers can more easily learn about it.
To associate your repository with the efficient-inference topic, visit your repo's landing page and select "manage topics."