Skip to content
View janicevidal's full-sized avatar
🈵
Focusing
🈵
Focusing

Block or report janicevidal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A throughput-oriented high-performance serving framework for LLMs

Cuda 627 24 Updated Sep 21, 2024

simplify >2GB large onnx model

Python 42 3 Updated Mar 1, 2024

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

5,638 431 Updated Jul 26, 2024
Python 24 4 Updated Sep 7, 2024
Python 159 27 Updated Jul 24, 2024

This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".

Python 239 14 Updated Jul 25, 2024

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

C++ 135 15 Updated Aug 27, 2024

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 539 49 Updated Oct 14, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 655 53 Updated Nov 5, 2024

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 302 25 Updated Aug 13, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 2,023 34 Updated Oct 22, 2024

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

C++ 236 24 Updated Mar 15, 2024

A primitive library for neural network

C++ 1,290 215 Updated Nov 5, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,771 192 Updated Nov 1, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,140 67 Updated Oct 14, 2024

a lightweight LLM model inference framework

C++ 699 87 Updated Apr 7, 2024

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).

Python 34 3 Updated Nov 4, 2024
Python 115 5 Updated Sep 27, 2024

[ECCV 2024] Taming Lookup Tables for Efficient Image Retouching

Python 27 Updated Jul 12, 2024

compiler learning resources collect.

Python 2,136 331 Updated May 27, 2024

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Python 217 15 Updated Oct 17, 2024

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

Python 223 13 Updated Jun 4, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 586 24 Updated Nov 4, 2024

LLM inference in C/C++

C++ 67,332 9,672 Updated Nov 5, 2024

Effective Fusion Factor in FPN for Tiny Object Detection(WACV2021)

Python 58 8 Updated Jan 23, 2021

[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Python 321 36 Updated Dec 21, 2023

Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition

Jupyter Notebook 544 87 Updated Oct 31, 2021
Python 718 172 Updated Mar 24, 2023

Inference Llama 2 in one file of pure C

C 17,430 2,084 Updated Aug 6, 2024
Next