Skip to content
View Michaelvll's full-sized avatar
  • Sky Computing Lab, UC Berkeley
  • Berkeley, CA

Organizations

@ACM-Class-2016
Block or Report

Block or report Michaelvll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

dstack is an easy-to-use and flexible container orchestrator for running AI workloads in any cloud or data center.

Python 1,203 87 Updated Jul 5, 2024

A framework for serving and evaluating LLM routers.

Python 610 43 Updated Jul 4, 2024

My PhD thesis on resource efficient machine learning

TeX 2 Updated Dec 29, 2023
Python 1 Updated Jun 17, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 64,101 7,455 Updated Jul 2, 2024

LLM training in simple, raw C/CUDA

Cuda 21,408 2,326 Updated Jul 4, 2024
Python 2 Updated Jun 18, 2024
Python 66 9 Updated Jun 29, 2024

Releasing the spot availability traces used in "Can't Be Late" paper.

14 Updated Mar 31, 2024

Patch convolution to avoid large GPU memory usage of Conv2D

Python 70 4 Updated May 26, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 490 12 Updated Jun 28, 2024

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 5,924 334 Updated Jul 4, 2024

Modeling, training, eval, and inference code for OLMo

Python 4,193 390 Updated Jul 5, 2024

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.

Python 1,124 84 Updated Jul 5, 2024

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 1,869 126 Updated Jul 3, 2024

🐍 | Python library for RunPod API and serverless worker SDK.

Python 164 55 Updated Jun 10, 2024

The Unified ML Representation

Python 14,024 5,820 Updated Jul 5, 2024

Self-hosted AI coding assistant

Rust 18,304 771 Updated Jul 5, 2024

Fast and memory-efficient exact attention

Python 11,883 1,054 Updated Jul 5, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,083 150 Updated Jun 12, 2024
Python 52 16 Updated Apr 12, 2024

Official repository for LongChat and LongEval

Python 499 29 Updated May 24, 2024

Like PyTorch for building ML systems. Iterable, debuggable, multi-cloud, 100% reproducible across research and production.

Python 942 37 Updated Jul 5, 2024

LLMs for your CLI

Python 1,215 73 Updated May 29, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 22,253 3,140 Updated Jul 5, 2024

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Python 6,197 426 Updated Jul 5, 2024
Next