Skip to content
View wolf1981's full-sized avatar
Block or Report

Block or report wolf1981

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Universal LLM Deployment Engine with ML Compilation

Python 17,858 1,422 Updated Jul 21, 2024

Code Repository of Evaluating Quantized Large Language Models

Python 80 4 Updated Mar 27, 2024

Implementation for MatMul-free LM.

Python 2,711 165 Updated Jun 27, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 12,735 1,027 Updated Jun 27, 2024

Transformer related optimization, including BERT, GPT

C++ 58 33 Updated Sep 20, 2023

TigerBot: A multi-language multi-task LLM

Python 2,225 195 Updated Jun 7, 2024

a lightweight LLM model inference framework

C++ 660 81 Updated Apr 7, 2024

Aligning pretrained language models with instruction data generated by themselves.

Python 3,962 463 Updated Mar 27, 2023

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Python 219 31 Updated Jan 29, 2023

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,502 557 Updated Jul 21, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 4,959 849 Updated Jul 17, 2024

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ 23 7 Updated Feb 9, 2021

flexible-gemm conv of deepcore

C 17 14 Updated Dec 2, 2019

Compiler for Neural Network hardware accelerators

C++ 3,190 685 Updated May 11, 2024

A domain specific language to express machine learning workloads.

C++ 1,760 212 Updated Apr 28, 2023

A CPU tool for benchmarking the peak of floating points

Assembly 454 117 Updated May 10, 2024

Subpart source code of of deepcore v0.7

C 27 14 Updated Jun 28, 2020

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,012 199 Updated Jun 8, 2023

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 6,181 1,470 Updated Jul 21, 2024

Library for specialized dense and sparse matrix operations, and deep learning primitives.

C 827 183 Updated Jul 1, 2024

Winograd minimal convolution algorithm generator for convolutional neural networks.

Python 597 142 Updated Oct 17, 2020

Library for fast image convolution in neural networks on Intel Architecture

C 27 16 Updated Jun 25, 2017

a heterogeneous multiGPU level-3 BLAS library

C 46 11 Updated Dec 9, 2019

Assembler for NVIDIA Maxwell architecture

Sass 935 160 Updated Jan 3, 2023

An MPI-based C++ or Python library for easy distributed pipeline processing

C++ 33 5 Updated Jul 30, 2018

Boost.org mpi module

C++ 59 63 Updated Jul 20, 2024

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

C# 99 16 Updated Jan 17, 2023