Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 261 106 Updated Aug 2, 2024

fanshiqing / grouped_gemm

Forked from tgale96/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 41 19 Updated Jul 18, 2024

pytorch / torchtitan

A native PyTorch Library for large model training

Python 1,394 125 Updated Aug 2, 2024

liyucheng09 / llm-compressive

Longitudinal Evaluation of LLMs via Data Compression

Python 24 Updated May 29, 2024

Gfans / ISPH_NVIDIA_CUDA_CONTEST

Fluid Simulation using CUDA (SPH/WCSPH/PCISPH)

C 60 14 Updated May 12, 2015

Gfans / UnifiedParticleFrameworkCUDA

A unified particle framework similar to NVIDIA FleX.

C 127 15 Updated Aug 5, 2015

Macaronlin / LLaMA3-Quantization

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Python 140 5 Updated May 27, 2024

HeKun-NVIDIA / CUDA-Programming-Guide-in-Chinese

This is a Chinese translation of the CUDA programming guide

1,079 168 Updated Jan 5, 2023

xmake-io / xmake-repo

📦 An official xmake package repository

Lua 628 380 Updated Aug 2, 2024

buddy-compiler / buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

C++ 453 153 Updated Aug 1, 2024

AlbertXiebnu / presentationTemplate

自制北师大学术PPT模板

TeX 2 Updated May 23, 2015

apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Java 7,677 1,701 Updated Aug 2, 2024

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,189 438 Updated Jul 26, 2024

OpenAtomFoundation / pikiwidb

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB

C++ 179 59 Updated Aug 2, 2024

OpenPPL / ppl.llm.kernel.cuda

C++ 132 23 Updated Jul 19, 2024

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 38,436 4,319 Updated Aug 2, 2024

datawhalechina / thorough-pytorch

PyTorch入门教程，在线阅读地址：https://datawhalechina.github.io/thorough-pytorch/

Jupyter Notebook 2,264 387 Updated Jul 10, 2024

alibaba / higress

Cloud Native API Gateway | 云原生API网关

Go 2,657 436 Updated Aug 2, 2024

SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C# 2,350 315 Updated Aug 2, 2024

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 775 121 Updated Jul 29, 2023