Skip to content
View xiezipeng-ML's full-sized avatar
  • SiliconFlow, OneFlow
  • 14:58 (UTC +08:00)
Block or Report

Block or report xiezipeng-ML

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured…

Python 435 38 Updated Jul 16, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 33,912 3,977 Updated Jul 18, 2024

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Go 79,004 6,022 Updated Jul 18, 2024

Brand new TTS solution

Python 5,745 447 Updated Jul 17, 2024

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

TypeScript 73,019 57,995 Updated Jul 17, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

882 17 Updated Jul 10, 2024

中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。

Python 5,189 1,218 Updated Sep 23, 2020

Material for cuda-mode lectures

Jupyter Notebook 1,824 175 Updated Jun 13, 2024

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 888 85 Updated Jul 17, 2024

The official Python library for the OpenAI API

Python 21,234 2,890 Updated Jul 18, 2024
Python 1,370 117 Updated Jun 18, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,321 137 Updated Jun 20, 2024

compiler learning resources collect.

Python 1,957 311 Updated May 27, 2024

人工精调的中文对话数据集和一段chatglm的微调代码

Jupyter Notebook 1,120 94 Updated May 6, 2024

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

446 20 Updated Apr 7, 2024

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,617 366 Updated Mar 14, 2024

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 26,647 3,300 Updated Jul 16, 2024

A framework for few-shot evaluation of language models.

Python 5,876 1,567 Updated Jul 18, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 8,809 889 Updated Jul 17, 2024

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Python 3,347 270 Updated Jul 17, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,509 819 Updated Jul 18, 2024

Fast and memory-efficient exact attention

Python 12,424 1,104 Updated Jul 15, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 7,785 1,423 Updated Jul 18, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,372 355 Updated Jul 11, 2024

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 15,077 1,445 Updated Jul 18, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,031 133 Updated Jun 25, 2024

Development repository for the Triton language and compiler

C++ 12,012 1,430 Updated Jul 18, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,121 3,279 Updated Jul 18, 2024
Next