Skip to content
View zccyman's full-sized avatar
  • wondertek
  • Shanghai,China

Block or report zccyman

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Python 4,791 477 Updated Sep 25, 2024

计图大模型推理库,具有高性能、配置要求低、中文支持好、可移植等特点

Python 2,363 182 Updated Jan 6, 2024
MLIR 397 70 Updated Oct 7, 2024

oneCCL Bindings for Pytorch*

C++ 85 23 Updated Sep 10, 2024

A model compilation solution for various hardware

MLIR 362 38 Updated Sep 30, 2024

LLM101n: Let's build a Storyteller

29,196 1,599 Updated Aug 1, 2024

Hands-On Practical MLIR Tutorial

C++ 301 40 Updated Oct 20, 2023

CodeGeeX4-ALL-9B, a versatile model for all AI software development scenarios, including code completion, code interpreter, web search, function calling, repository-level Q&A and much more.

Python 1,293 98 Updated Aug 25, 2024

Hands-On Practical MLIR Tutorial

C++ 11 Updated Jul 22, 2024

Development repository for the Triton language and compiler

C++ 12,950 1,575 Updated Oct 8, 2024

Run generative AI models in sophgo BM1684X

Python 105 17 Updated Oct 7, 2024

An innovative library for efficient LLM inference via low-bit quantization

C++ 345 37 Updated Aug 30, 2024

Model compression for ONNX

Python 69 8 Updated Sep 23, 2024

一键命令下载飞书文档为 Markdown

Go 1,138 112 Updated Aug 27, 2024

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 267 24 Updated Jul 2, 2024
C++ 33 6 Updated Apr 1, 2024
Assembly 8 9 Updated Jan 22, 2024

Tensor library for machine learning

C++ 10,964 1,008 Updated Oct 6, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,394 184 Updated Jul 16, 2024

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.

Python 145 14 Updated Sep 23, 2024

2021年最新整理, C++ 学习资料,含C++ 11 / 14 / 17 / 20 / 23 新特性、入门教程、推荐书籍、优质文章、学习笔记、教学视频等

C++ 4,928 1,033 Updated Jun 8, 2022

Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take pretrained machine learning models in popular frameworks and …

45 18 Updated Oct 7, 2024

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

Python 1,542 163 Updated Oct 7, 2024

Model Quantization Benchmark

Shell 9 5 Updated Sep 23, 2024

Awesome LLM compression research papers and tools.

1,100 66 Updated Oct 7, 2024
Python 77 14 Updated Nov 17, 2023

Open deep learning compiler stack for Kendryte AI accelerators ✨

C# 743 181 Updated Sep 30, 2024

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Python 33 2 Updated Mar 11, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,473 197 Updated Oct 8, 2024

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 243 27 Updated Sep 27, 2024
Next