lvhan028

Lyu Han lvhan028

94 followers · 10 following

China
23:58 (UTC -12:00)

Achievements

x3 x2

Achievements

x3 x2

Stars

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 2,988 300 Updated Nov 9, 2024

InternLM / turbomind

C++ 32 1 Updated Nov 7, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 613 47 Updated Sep 4, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 630 25 Updated Sep 21, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 248 39 Updated Nov 12, 2024

opendatalab / labelU

Data annotation toolbox supports image, audio and video data.

Python 848 78 Updated Nov 11, 2024

opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 5,494 367 Updated Oct 24, 2024

opendatalab / MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具，支持PDF/网页/多格式电子书提取。

Python 13,900 1,044 Updated Nov 12, 2024

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 779 36 Updated Nov 11, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,664 514 Updated Oct 18, 2024

bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.

Python 10,032 636 Updated Nov 11, 2024

bentoml / BentoLMDeploy

Self-host LLMs with LMDeploy and BentoML

Python 16 2 Updated Jul 30, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,618 425 Updated Nov 12, 2024

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 2,516 154 Updated Oct 10, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

12,591 804 Updated Nov 10, 2024

InternLM / Agent-FLAN

[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

324 10 Updated Mar 22, 2024

ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Go 97,414 7,753 Updated Nov 12, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,185 2,228 Updated Aug 12, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,418 131 Updated Nov 11, 2024

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

15,922 1,470 Updated Sep 19, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 5,993 496 Updated Nov 12, 2024

InternLM / OpenAOE

LLM Group Chat Framework: chat with multiple LLMs at the same time. 大模型群聊框架：同时与多个大语言模型聊天。

TypeScript 251 23 Updated Apr 10, 2024

InternLM / HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Python 1,511 127 Updated Oct 29, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,957 412 Updated Sep 6, 2024

state-spaces / mamba

Mamba SSM architecture

Python 13,166 1,122 Updated Nov 5, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,893 175 Updated Nov 8, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,626 984 Updated Nov 12, 2024

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 9,709 861 Updated Nov 12, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,656 364 Updated Jul 11, 2024

jundaf2 / INT8-Flash-Attention-FMHA-Quantization

Cuda 156 17 Updated Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lyu Han lvhan028

Achievements

Achievements

Block or report lvhan028

Stars

gpu-mode / lectures

InternLM / turbomind

IST-DASLab / marlin

efeslab / Nanoflow

microsoft / mscclpp

opendatalab / labelU

opendatalab / PDF-Extract-Kit

opendatalab / MinerU

microsoft / MInference

pytorch-labs / gpt-fast

bentoml / OpenLLM

bentoml / BentoLMDeploy

InternLM / lmdeploy

InternLM / InternLM-XComposer

BradyFU / Awesome-Multimodal-Large-Language-Models

InternLM / Agent-FLAN

ollama / ollama

haotian-liu / LLaVA

flashinfer-ai / flashinfer

HqWu-HITCS / Awesome-Chinese-LLM

sgl-project / sglang

InternLM / OpenAOE

InternLM / HuixiangDou

SJTU-IPADS / PowerInfer

state-spaces / mamba

microsoft / DeepSpeed-MII

NVIDIA / TensorRT-LLM

mistralai / mistral-inference

mit-han-lab / streaming-llm

jundaf2 / INT8-Flash-Attention-FMHA-Quantization