-
SiliconFlow, OneFlow
-
14:58
(UTC +08:00)
Block or Report
Block or report xiezipeng-ML
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
Material for cuda-mode lectures
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
The official Python library for the OpenAI API
Doing simple retrieval from LLM models at various context lengths to measure accuracy
人工精调的中文对话数据集和一段chatglm的微调代码
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
A framework for few-shot evaluation of language models.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Fast and memory-efficient exact attention
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Development repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs