-
Sky Computing Lab, UC Berkeley
- Berkeley, CA
Block or Report
Block or report Michaelvll
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
dstack is an easy-to-use and flexible container orchestrator for running AI workloads in any cloud or data center.
A framework for serving and evaluating LLM routers.
My PhD thesis on resource efficient machine learning
Robust Speech Recognition via Large-Scale Weak Supervision
Releasing the spot availability traces used in "Can't Be Late" paper.
Patch convolution to avoid large GPU memory usage of Conv2D
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Modeling, training, eval, and inference code for OLMo
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
🐍 | Python library for RunPod API and serverless worker SDK.
Fast and memory-efficient exact attention
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Official repository for LongChat and LongEval
Like PyTorch for building ML systems. Iterable, debuggable, multi-cloud, 100% reproducible across research and production.
A high-throughput and memory-efficient inference and serving engine for LLMs
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.