AndySong20

Follow

AndySong20

Follow

1 follower · 3 following

Block or Report

Block or report AndySong20

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 460 34 Updated Jul 10, 2024

pytorch-labs / float8_experimental

This repository contains the experimental PyTorch native float8 training UX

Python 195 18 Updated Jul 17, 2024

mlcommons / training

Reference implementations of MLPerf™ training benchmarks

Python 1,575 548 Updated Jul 17, 2024

NVIDIA / warp

A Python framework for high performance GPU simulation and graphics

Python 3,858 213 Updated Jul 17, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,505 816 Updated Jul 17, 2024

wklken / k-vim

vim配置

Vim Script 4,889 1,809 Updated Nov 16, 2023

YaoFANGUK / video-subtitle-generator

视频音频生成字幕，生成srt文件。无需申请第三方API，本地实现音频转文本。基于Transformer的视频字幕生成框架。A GUI tool for generating subtitle from videos and generating srt files.

Python 725 151 Updated Feb 1, 2024

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 9,623 743 Updated May 19, 2024

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,423 296 Updated Jul 16, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 5,662 877 Updated Mar 27, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,083 3,276 Updated Jul 17, 2024

NVIDIA / NeMo-Framework-Launcher

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Python 426 127 Updated Jul 17, 2024

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 10,954 2,285 Updated Jul 17, 2024

SharingSource / darwin.v

76 10 Updated Jul 17, 2024

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 5,808 1,700 Updated Jul 17, 2024

RogerDeng / street-fighter-ai

Python 23 11 Updated Apr 15, 2023

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,655 262 Updated Jul 17, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 9,614 1,384 Updated Jul 17, 2024

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

Python 136,187 25,960 Updated Jul 16, 2024

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

C++ 6,496 2,112 Updated Jul 17, 2024

oneapi-src / oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces

C++ 593 150 Updated Jul 17, 2024

NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 373 70 Updated Jun 25, 2024

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 80,491 21,617 Updated Jul 17, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,046 568 Updated Jul 14, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,007 1,427 Updated Jul 17, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 12,407 1,103 Updated Jul 15, 2024

kubewharf / kubebrain

A High Performance Metadata System for Kubernetes

Go 750 79 Updated May 13, 2024

NVIDIA / libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

C++ 2,294 187 Updated Feb 7, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 4,925 846 Updated Jul 17, 2024

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 29,545 3,491 Updated Jul 17, 2024