Skip to content
View anmarques's full-sized avatar
  • Neural Magic
  • Somerville, Massachusetts USA

Organizations

@neuralmagic

Block or report anmarques

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 25,976 3,803 Updated Sep 5, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 378 28 Updated Sep 5, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 248 9 Updated Sep 5, 2024

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Python 363 24 Updated Jul 19, 2024

Sparsity-aware deep learning inference runtime for CPUs

Python 2,969 171 Updated Jul 19, 2024

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Python 2,035 142 Updated Aug 1, 2024

Contour Location Via Entropy Reduction (NeurIPS 2018)

Python 8 4 Updated May 11, 2019