Skip to content
View UranusSeven's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report UranusSeven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

16 stars written in C++
Clear filter

LLM inference in C/C++

C++ 61,436 8,783 Updated Jul 10, 2024

Port of OpenAI's Whisper model in C/C++

C++ 33,038 3,304 Updated Jul 9, 2024

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

C++ 13,903 3,384 Updated Jul 10, 2024

Development repository for the Triton language and compiler

C++ 11,928 1,414 Updated Jul 10, 2024

Conversion between Traditional and Simplified Chinese

C++ 8,215 972 Updated Jun 19, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,647 407 Updated Jul 1, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,432 802 Updated Jul 10, 2024

Transformer related optimization, including BERT, GPT

C++ 5,643 877 Updated Mar 27, 2024

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

C++ 3,288 1,086 Updated Jul 10, 2024

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4

C++ 2,830 327 Updated Jun 24, 2024

Microsoft Collective Communication Library

C++ 271 26 Updated Sep 20, 2023

A Easy-to-understand TensorOp Matmul Tutorial

C++ 221 22 Updated Jun 15, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 183 27 Updated Jul 9, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 80 7 Updated Jul 9, 2024

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 79 12 Updated May 21, 2024

High performance Transformer implementation in C++.

C++ 43 2 Updated Apr 22, 2024