Block or Report
Block or report wolf1981
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Sort by: Recently starred
Universal LLM Deployment Engine with ML Compilation
Code Repository of Evaluating Quantized Large Language Models
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Transformer related optimization, including BERT, GPT
TigerBot: A multi-language multi-task LLM
Aligning pretrained language models with instruction data generated by themselves.
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
A retargetable MLIR-based machine learning compiler and runtime toolkit.
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
A domain specific language to express machine learning workloads.
A CPU tool for benchmarking the peak of floating points
Efficient GPU kernels for block-sparse matrix multiplication and convolution
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Winograd minimal convolution algorithm generator for convolutional neural networks.
Library for fast image convolution in neural networks on Intel Architecture
An MPI-based C++ or Python library for easy distributed pipeline processing
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.