-
-
Chinese-LLaMA-Alpaca-2 Public
Forked from ymcui/Chinese-LLaMA-Alpaca-2中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Python Apache License 2.0 UpdatedApr 24, 2024 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedApr 17, 2024 -
-
AutoAWQ Public
Forked from casper-hansen/AutoAWQAutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Python MIT License UpdatedMar 26, 2024 -
export_llm_to_onnx Public
Forked from luchangli03/export_llama_to_onnxexport llama to onnx
Python MIT License UpdatedMar 25, 2024 -
LLM-QAT Public
Forked from facebookresearch/LLM-QATCode repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
Python Other UpdatedMar 25, 2024 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedMar 22, 2024 -
GPTQ-for-LLaMa Public
Forked from qwopqwop200/GPTQ-for-LLaMa4 bits quantization of LLaMA using GPTQ
Python Apache License 2.0 UpdatedMar 22, 2024 -
-
LLM-FP4 Public
Forked from nbasyl/LLM-FP4The official implementation of the EMNLP 2023 paper LLM-FP4
Python MIT License UpdatedJan 16, 2024 -
FasterTransformer Public
Forked from NVIDIA/FasterTransformerTransformer related optimization, including BERT, GPT
C++ Apache License 2.0 UpdatedJan 15, 2024 -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-