whutbd

💭

I may be slow to respond.

3 followers · 180 following

Block or Report

Block or report whutbd

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

tensorrtllm_backend Public
Forked from triton-inference-server/tensorrtllm_backend

The Triton TensorRT-LLM Backend

Python Apache License 2.0 Updated Jul 16, 2024
whisper.cpp Public
Forked from ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

C++ MIT License Updated Jul 14, 2024
llama2.c Public
Forked from karpathy/llama2.c

Inference Llama 2 in one file of pure C

C MIT License Updated Jul 13, 2024
MedicalGPT Public
Forked from shibing624/MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Python Apache License 2.0 Updated Jul 7, 2024
llama.cpp Public
Forked from ggerganov/llama.cpp

LLM inference in C/C++

C++ MIT License Updated Jul 3, 2024
lectures Public
Forked from cuda-mode/lectures

Material for cuda-mode lectures

Jupyter Notebook Apache License 2.0 Updated Jun 13, 2024
rtp-llm Public
Forked from alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ Apache License 2.0 Updated Jun 12, 2024
sentencepiece Public
Forked from google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

C++ Apache License 2.0 Updated Jun 5, 2024
llm.c Public
Forked from karpathy/llm.c

LLM training in simple, raw C/CUDA

Cuda MIT License Updated May 21, 2024
onnx-modifier Public
Forked from ZhangGe6/onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript MIT License Updated Apr 22, 2024
cuda_sgemm Public
Forked from njuhope/cuda_sgemm

Cuda Updated Apr 11, 2024
CTranslate2 Public
Forked from OpenNMT/CTranslate2

Fast inference engine for Transformer models

C++ MIT License Updated Apr 10, 2024
flash-attention-minimal Public
Forked from tspeterkim/flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda Apache License 2.0 Updated Apr 7, 2024
InferLLM Public
Forked from MegEngine/InferLLM

a lightweight LLM model inference framework

C++ Apache License 2.0 Updated Apr 7, 2024
fastllm Public
Forked from ztxz16/fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

C++ Apache License 2.0 Updated Mar 13, 2024
TensorRT_Tutorial Public
Forked from LitLeo/TensorRT_Tutorial

C++ Updated Mar 13, 2024
cute-gemm Public
Forked from reed-lau/cute-gemm

C++ Updated Feb 29, 2024
core Public
Forked from triton-inference-server/core

The core library and APIs implementing the Triton Inference Server.

C++ BSD 3-Clause "New" or "Revised" License Updated Feb 17, 2024
cuda-learn-note Public
Forked from DefTruth/CUDA-Learn-Notes

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记，个人笔记，更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 2 GNU General Public License v3.0 Updated Jan 25, 2024
pytorch-transformer Public
Forked from owenliang/pytorch-transformer

pytorch复现transformer

Python Updated Jan 18, 2024
FasterTransformer Public
Forked from NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

C++ Apache License 2.0 Updated Jan 15, 2024
SGEMM_CUDA Public
Forked from siboehm/SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda MIT License Updated Dec 28, 2023
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Dec 4, 2023
seamless_communication Public
Forked from facebookresearch/seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

C Other Updated Dec 2, 2023
ggml-tutorial Public
Forked from StudyingLover/ggml-tutorial

C++ Updated Nov 12, 2023
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python Apache License 2.0 Updated Nov 1, 2023
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated Oct 23, 2023
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ Apache License 2.0 Updated Oct 20, 2023
ppl.llm.kernel.cuda Public
Forked from openppl-public/ppl.llm.kernel.cuda

C++ Apache License 2.0 Updated Oct 14, 2023
byteps Public
Forked from bytedance/byteps

A high performance and generic framework for distributed DNN training

Python Other Updated Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whutbd

Block or report whutbd

tensorrtllm_backend Public

whisper.cpp Public

llama2.c Public

MedicalGPT Public

llama.cpp Public

lectures Public

rtp-llm Public

sentencepiece Public

llm.c Public

onnx-modifier Public

cuda_sgemm Public

CTranslate2 Public

flash-attention-minimal Public

InferLLM Public

fastllm Public

TensorRT_Tutorial Public

cute-gemm Public

core Public

cuda-learn-note Public

pytorch-transformer Public

FasterTransformer Public

SGEMM_CUDA Public

flashinfer Public

seamless_communication Public

ggml-tutorial Public

PaddleOCR Public

vllm Public

TensorRT-LLM Public

ppl.llm.kernel.cuda Public

byteps Public