Skip to content
View Jeffwan's full-sized avatar
  • Bytedance
  • Seattle, WA

Highlights

  • Pro

Organizations

@kubeflow @volcano-sh

Block or report Jeffwan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Predict the performance of LLM inference services

Jupyter Notebook 14 Updated Jun 27, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 640 26 Updated Sep 21, 2024

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

C++ 835 122 Updated Nov 12, 2024

Efficient and easy multi-instance LLM serving

Python 219 12 Updated Nov 22, 2024

A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.

Python 13 3 Updated Jun 9, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,758 98 Updated Jan 21, 2024

Stateless cluster local OCI registry mirror.

Go 1,298 70 Updated Nov 22, 2024

Serverless LLM Serving for Everyone.

Python 358 33 Updated Nov 23, 2024

Fast Distributed Inference Serving for Large Language Models

3 Updated Oct 18, 2023

Distributed Model Serving Framework

Java 154 64 Updated Oct 11, 2024

Custom controller that extends the Horizontal Pod Autoscaler

Go 212 25 Updated Nov 16, 2024

paper and its code for AI System

216 13 Updated Aug 29, 2024

SpotServe: Serving Generative Large Language Models on Preemptible Instances

102 8 Updated Feb 22, 2024

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,211 145 Updated Nov 23, 2024

[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild

Python 4,001 444 Updated Nov 18, 2024

Letta (formerly MemGPT) is a framework for creating LLM services with memory.

Python 12,909 1,415 Updated Nov 23, 2024

A efficient and effective few-shot NL2SQL method on GPT-4.

Python 440 71 Updated Jun 4, 2024

Generate comic panels using a LLM + SDXL. Powered by Hugging Face 🤗

TypeScript 1,058 220 Updated Oct 15, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,653 169 Updated Nov 28, 2023

https://acl2023-retrieval-lm.github.io/

JavaScript 154 13 Updated Oct 18, 2023

Official release of InternLM2.5 base and chat models. 1M context support

Python 6,495 460 Updated Nov 21, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 30,681 4,655 Updated Nov 23, 2024

A user gateway that provides serverless AIGC experience.

Go 41 8 Updated Apr 17, 2024

Large Language Model Text Generation Inference

Python 9,138 1,075 Updated Nov 22, 2024
Python 344 38 Updated Mar 10, 2023

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 13,798 1,858 Updated Nov 22, 2024

ClearML - Model-Serving Orchestration and Repository Solution

Python 137 40 Updated Aug 15, 2024

Easy, Fast, Secure and Cost-Efficient LLM Pipelines to generate GhatGPT-like private domain models and knowledgeable agents for your organization.

6 1 Updated May 25, 2023

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 95,212 15,443 Updated Nov 23, 2024
Next