-
Chongqing University
- Chongqing
-
04:59
(UTC -12:00)
Block or Report
Block or report muse-coder
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (12)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
This is an implementation of sgemm_kernel on L1d cache.
A PyTorch implementation of Transformer in "Attention is All You Need"
Transformer: PyTorch Implementation of "Attention Is All You Need"
A PyTorch implementation of the Transformer model in "Attention is All You Need".
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
This is the top-level repository for the Accel-Sim framework.
Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
This repository contains integer operators on GPUs for PyTorch.
Code and documentation to train Stanford's Alpaca models, and generate the data.
A fast inference library for running LLMs locally on modern consumer-class GPUs
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
An easy to use PyTorch to TensorRT converter
🐩 🐩 🐩 TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化
Accessible large language models via k-bit quantization for PyTorch.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Fast and memory-efficient exact attention
Transformer related optimization, including BERT, GPT
A minimal GPU design in Verilog to learn how GPUs work from the ground up