wolf1981

wolf1981

2 followers · 40 following

Block or Report

Block or report wolf1981

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 17,858 1,422 Updated Jul 21, 2024

thu-nics / qllm-eval

Code Repository of Evaluating Quantized Large Language Models

Python 80 4 Updated Mar 27, 2024

ridgerchu / matmulfreellm

Implementation for MatMul-free LM.

Python 2,711 165 Updated Jun 27, 2024

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 12,735 1,027 Updated Jun 27, 2024

void-main / FasterTransformer

Forked from NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 58 33 Updated Sep 20, 2023

TigerResearch / TigerBot

TigerBot: A multi-language multi-task LLM

Python 2,225 195 Updated Jun 7, 2024

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 660 81 Updated Apr 7, 2024

yizhongw / self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Python 3,962 463 Updated Mar 27, 2023

kssteven418 / I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Python 219 31 Updated Jan 29, 2023

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,502 557 Updated Jul 21, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 4,959 849 Updated Jul 17, 2024

jeng1220 / cuGemmProf

A simple tool to profile performance of multiple combinations of GEMM of cuBLAS

C++ 23 7 Updated Feb 9, 2021

XiuYuLi / flexible-gemm

flexible-gemm conv of deepcore

C 17 14 Updated Dec 2, 2019

pytorch / glow

Compiler for Neural Network hardware accelerators

C++ 3,190 685 Updated May 11, 2024

facebookresearch / TensorComprehensions

A domain specific language to express machine learning workloads.

C++ 1,760 212 Updated Apr 28, 2023

pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

Assembly 454 117 Updated May 10, 2024

XiuYuLi / deepcore_source_code

Subpart source code of of deepcore v0.7

C 27 14 Updated Jun 28, 2020

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,012 199 Updated Jun 8, 2023

OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 6,181 1,470 Updated Jul 21, 2024

baidu-research / baidu-allreduce

Cuda 556 113 Updated Apr 6, 2018

libxsmm / libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

C 827 183 Updated Jul 1, 2024

andravin / wincnn

Winograd minimal convolution algorithm generator for convolutional neural networks.

Python 597 142 Updated Oct 17, 2020

ColfaxResearch / FALCON

Library for fast image convolution in neural networks on Intel Architecture

C 27 16 Updated Jun 25, 2017

linnanwang / BLASX

a heterogeneous multiGPU level-3 BLAS library

C 46 11 Updated Dec 9, 2019

NervanaSystems / maxas

Assembler for NVIDIA Maxwell architecture

Sass 935 160 Updated Jan 3, 2023

wwoods / job_stream

An MPI-based C++ or Python library for easy distributed pipeline processing

C++ 33 5 Updated Jul 30, 2018

boostorg / mpi

Boost.org mpi module

C++ 59 63 Updated Jul 20, 2024

SunsetQuest / CudaPAD

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

C# 99 16 Updated Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly