Block or Report
Block or report whutbd
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
tensorrtllm_backend Public
Forked from triton-inference-server/tensorrtllm_backendThe Triton TensorRT-LLM Backend
Python Apache License 2.0 UpdatedJul 16, 2024 -
whisper.cpp Public
Forked from ggerganov/whisper.cppPort of OpenAI's Whisper model in C/C++
C++ MIT License UpdatedJul 14, 2024 -
llama2.c Public
Forked from karpathy/llama2.cInference Llama 2 in one file of pure C
C MIT License UpdatedJul 13, 2024 -
MedicalGPT Public
Forked from shibing624/MedicalGPTMedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Python Apache License 2.0 UpdatedJul 7, 2024 -
llama.cpp Public
Forked from ggerganov/llama.cppLLM inference in C/C++
C++ MIT License UpdatedJul 3, 2024 -
lectures Public
Forked from cuda-mode/lecturesMaterial for cuda-mode lectures
Jupyter Notebook Apache License 2.0 UpdatedJun 13, 2024 -
rtp-llm Public
Forked from alibaba/rtp-llmRTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
C++ Apache License 2.0 UpdatedJun 12, 2024 -
sentencepiece Public
Forked from google/sentencepieceUnsupervised text tokenizer for Neural Network-based text generation.
C++ Apache License 2.0 UpdatedJun 5, 2024 -
llm.c Public
Forked from karpathy/llm.cLLM training in simple, raw C/CUDA
Cuda MIT License UpdatedMay 21, 2024 -
onnx-modifier Public
Forked from ZhangGe6/onnx-modifierA tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
JavaScript MIT License UpdatedApr 22, 2024 -
-
CTranslate2 Public
Forked from OpenNMT/CTranslate2Fast inference engine for Transformer models
C++ MIT License UpdatedApr 10, 2024 -
flash-attention-minimal Public
Forked from tspeterkim/flash-attention-minimalFlash Attention in ~100 lines of CUDA (forward pass only)
Cuda Apache License 2.0 UpdatedApr 7, 2024 -
InferLLM Public
Forked from MegEngine/InferLLMa lightweight LLM model inference framework
C++ Apache License 2.0 UpdatedApr 7, 2024 -
fastllm Public
Forked from ztxz16/fastllm纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
C++ Apache License 2.0 UpdatedMar 13, 2024 -
-
-
core Public
Forked from triton-inference-server/coreThe core library and APIs implementing the Triton Inference Server.
C++ BSD 3-Clause "New" or "Revised" License UpdatedFeb 17, 2024 -
cuda-learn-note Public
Forked from DefTruth/CUDA-Learn-Notes🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
-
pytorch-transformer Public
Forked from owenliang/pytorch-transformerpytorch复现transformer
Python UpdatedJan 18, 2024 -
FasterTransformer Public
Forked from NVIDIA/FasterTransformerTransformer related optimization, including BERT, GPT
C++ Apache License 2.0 UpdatedJan 15, 2024 -
SGEMM_CUDA Public
Forked from siboehm/SGEMM_CUDAFast CUDA matrix multiplication from scratch
Cuda MIT License UpdatedDec 28, 2023 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedDec 4, 2023 -
seamless_communication Public
Forked from facebookresearch/seamless_communicationFoundational Models for State-of-the-Art Speech and Text Translation
C Other UpdatedDec 2, 2023 -
-
PaddleOCR Public
Forked from PaddlePaddle/PaddleOCRAwesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Python Apache License 2.0 UpdatedNov 1, 2023 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedOct 23, 2023 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedOct 20, 2023 -
ppl.llm.kernel.cuda Public
Forked from openppl-public/ppl.llm.kernel.cudaC++ Apache License 2.0 UpdatedOct 14, 2023 -
byteps Public
Forked from bytedance/bytepsA high performance and generic framework for distributed DNN training
Python Other UpdatedOct 3, 2023