Block or Report
Block or report goTtheFuego
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (3)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
A native PyTorch Library for large model training
Longitudinal Evaluation of LLMs via Data Compression
Fluid Simulation using CUDA (SPH/WCSPH/PCISPH)
A unified particle framework similar to NVIDIA FleX.
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..
This is a Chinese translation of the CUDA programming guide
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB
Making large AI models cheaper, faster and more accessible
PyTorch入门教程,在线阅读地址:https://datawhalechina.github.io/thorough-pytorch/
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
欧港新CS留学项目指北
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.