Skip to content
View zhuohan123's full-sized avatar

Organizations

@alpa-projects
Block or Report

Block or report zhuohan123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Dynamic Memory Management for Serving LLMs without PagedAttention

C 159 10 Updated Aug 3, 2024

A framework for few-shot evaluation of language models.

Python 6,080 1,613 Updated Aug 7, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 135 9 Updated Jul 25, 2024

Blazingly fast LLM inference.

Rust 3,212 255 Updated Aug 6, 2024

🐚 OpenDevin: Code Less, Make More

Python 29,553 3,418 Updated Aug 7, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 24 26 Updated Aug 7, 2024

Tile primitives for speedy kernels

Cuda 1,433 53 Updated Aug 7, 2024

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 31,230 3,723 Updated Jul 29, 2024

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 101 5 Updated Jul 9, 2024

Arena-Hard-Auto: An automatic LLM benchmark.

Jupyter Notebook 358 39 Updated Jul 31, 2024
Python 1,443 126 Updated Aug 6, 2024

CUDA/Metal accelerated language model inference

C 351 13 Updated Jul 22, 2024

DSPy: The framework for programming—not prompting—foundation models

Python 15,298 1,185 Updated Aug 7, 2024

A parallel framework for training deep neural networks

Python 34 5 Updated Jul 29, 2024

[ICML 2024] CLLMs: Consistency Large Language Models

Python 326 14 Updated Aug 1, 2024

A simple library for scaling up JAX programs

Python 113 7 Updated Jun 26, 2024

Grok open release

Python 49,237 8,308 Updated Aug 7, 2024

Universal LLM Deployment Engine with ML Compilation

Python 18,288 1,456 Updated Aug 7, 2024

Standardized Serverless ML Inference Platform on Kubernetes

Python 3,362 1,018 Updated Aug 7, 2024

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,169 818 Updated Jul 27, 2024

CUDA Python Low-level Bindings

Python 837 66 Updated Aug 7, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,711 283 Updated Aug 7, 2024
Python 7,042 545 Updated Aug 7, 2024

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,623 960 Updated Jul 10, 2024

LlamaIndex is a data framework for your LLM applications

Python 34,261 4,840 Updated Aug 8, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 495 36 Updated Jul 10, 2024

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,968 245 Updated Aug 7, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,587 2,040 Updated Jul 31, 2024
Python 76 17 Updated Jul 29, 2024
Next