Skip to content
View muse-coder's full-sized avatar
  • Chongqing University
  • Chongqing
  • 04:59 (UTC -12:00)
Block or Report

Block or report muse-coder

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

This is an implementation of sgemm_kernel on L1d cache.

Assembly 213 33 Updated Feb 26, 2024

A PyTorch implementation of Transformer in "Attention is All You Need"

Python 103 28 Updated Dec 6, 2020

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 2,536 394 Updated Apr 17, 2024

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 8,636 1,954 Updated Apr 16, 2024
Python 41 15 Updated Nov 18, 2019

Yinghan's Code Sample

Cuda 259 49 Updated Jul 25, 2022

NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.

Verilog 284 57 Updated Dec 27, 2023

This is the top-level repository for the Accel-Sim framework.

Python 271 105 Updated Jul 14, 2024
Python 2 8 Updated Oct 25, 2018

Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.

Jupyter Notebook 196 73 Updated Apr 22, 2019

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,159 159 Updated Jul 16, 2024

This repository contains integer operators on GPUs for PyTorch.

Python 159 48 Updated Sep 29, 2023

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,188 4,020 Updated Jul 17, 2024

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 8,050 785 Updated Jul 17, 2024

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 3,282 244 Updated Jul 23, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 469 34 Updated Jul 10, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,150 429 Updated Jul 17, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 12,763 1,030 Updated Jun 27, 2024

An easy to use PyTorch to TensorRT converter

Python 4,493 670 Updated Jun 17, 2024

🐩 🐩 🐩 TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化

Python 130 19 Updated Jul 6, 2022
Python 548 50 Updated Jun 19, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 5,796 588 Updated Jul 23, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 3,479 309 Updated Jul 23, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,228 2,452 Updated Jul 15, 2024

Inference code for Llama models

Python 54,354 9,332 Updated Jul 23, 2024

LLM training in simple, raw C/CUDA

Cuda 22,222 2,461 Updated Jul 23, 2024

Fast and memory-efficient exact attention

Python 12,559 1,120 Updated Jul 23, 2024

Transformer related optimization, including BERT, GPT

C++ 5,674 878 Updated Mar 27, 2024

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 6,730 501 Updated Jun 14, 2024
Next