fp8

Here are 7 public repositories matching this topic...

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Oct 31, 2024
Python

Azure / MS-AMP

Star

Microsoft Automatic Mixed Precision Library

deep-learning gpu amp pytorch transformer mixed-precision fp8

Updated Sep 29, 2024
Python

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Aug 30, 2024
C++

aredden / flux-fp8-api

Star

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

flux pytorch quantization diffusion fast-inference fp8