Skip to content
View janicevidal's full-sized avatar
🈵
Focusing
🈵
Focusing

Block or report janicevidal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A throughput-oriented high-performance serving framework for LLMs

Cuda 564 23 Updated Sep 21, 2024

simplify >2GB large onnx model

Python 40 3 Updated Mar 1, 2024

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

5,567 426 Updated Jul 26, 2024
Python 17 3 Updated Sep 7, 2024
Python 156 24 Updated Jul 24, 2024

This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".

Python 219 13 Updated Jul 25, 2024

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

C++ 133 14 Updated Aug 27, 2024

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 522 48 Updated Sep 26, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 549 46 Updated Sep 28, 2024

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 284 25 Updated Aug 13, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 1,977 34 Updated Jun 6, 2024

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

C++ 235 24 Updated Mar 15, 2024

A primitive library for neural network

C++ 1,277 215 Updated Aug 18, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,580 174 Updated Oct 3, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,111 65 Updated Feb 14, 2024

a lightweight LLM model inference framework

C++ 683 86 Updated Apr 7, 2024

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).

Python 33 Updated Jul 11, 2024
Python 112 5 Updated Sep 27, 2024

[ECCV 2024] Taming Lookup Tables for Efficient Image Retouching

Python 24 Updated Jul 12, 2024

compiler learning resources collect.

Python 2,079 324 Updated May 27, 2024

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Python 201 12 Updated Jul 3, 2024

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

Python 211 11 Updated Jun 4, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 560 21 Updated Aug 17, 2024

LLM inference in C/C++

C++ 65,762 9,444 Updated Oct 3, 2024

Effective Fusion Factor in FPN for Tiny Object Detection(WACV2021)

Python 58 8 Updated Jan 23, 2021

[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Python 317 37 Updated Dec 21, 2023

Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition

Jupyter Notebook 533 87 Updated Oct 31, 2021
Python 710 170 Updated Mar 24, 2023

Inference Llama 2 in one file of pure C

C 17,244 2,056 Updated Aug 6, 2024
Next