- Draper, Utah
-
21:57
(UTC -06:00) - jeffcook.io
Block or Report
Block or report sjuxax
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Official inference repo for FLUX.1 models
The Triton TensorRT-LLM Backend
This repository contains integer operators on GPUs for PyTorch.
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
A pytorch quantization backend for optimum
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
ComfyUI nodes to use segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHX…
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Easy and Efficient Quantization for Transformers
Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Hybrid search engine, combining best features of text and semantic search worlds
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Lightning Fast: Faiss CPU + Onnx Quantized Multilingual Embedding Model
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
💭 Retrieval augmented generation (RAG) and language model powered search applications
DSPy: The framework for programming—not prompting—foundation models
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
🔍 LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your d…
All-in-one infrastructure for search, recommendations, RAG, and analytics offered via API