Skip to content
View Raphael-Hao's full-sized avatar
🪄
Mogic
🪄
Mogic

Highlights

  • Pro
Block or Report

Block or report Raphael-Hao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

395 results for source starred repositories
Clear filter

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 457 34 Updated Jul 10, 2024

LLM training code for Databricks foundation models

Python 3,849 504 Updated Jul 12, 2024

OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go

Go 8,654 1,303 Updated Jul 12, 2024

LLama.cpp golang bindings

C++ 624 78 Updated Jun 13, 2024

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 83 5 Updated Jul 3, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 64,477 7,513 Updated Jul 2, 2024

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 51,567 8,642 Updated Jul 5, 2024

[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs

Cuda 68 2 Updated Jun 7, 2024
HTML 67 10 Updated Dec 2, 2022

Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models

Python 2,107 284 Updated Jul 10, 2024

[CVPR 2024] Deformable Convolution v4

Python 423 25 Updated May 17, 2024

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 48 7 Updated May 16, 2024

This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email ([email protected])

Python 242 26 Updated Mar 24, 2024
JavaScript 936 206 Updated Jul 3, 2024

Stick Rules -- Quantumult X / Loon / ClashX Rules \ Quantumult back to CN Rules

986 217 Updated Jun 18, 2024

毒奶自用,懒人配置文件(Quantumult X):去广告分流规则、Tiktok解锁重写、VSCO解锁、神机分流、blackmatrix7分流规则。

JavaScript 2,284 185 Updated Jun 5, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 83 8 Updated Jul 9, 2024

Microsoft Azure Traces

Jupyter Notebook 751 140 Updated Jun 30, 2024
Python 72 16 Updated Jul 3, 2024

Stateful LLM Serving

Python 16 1 Updated May 30, 2024

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 61 1 Updated Jun 30, 2024

Tile primitives for speedy kernels

Cuda 1,373 47 Updated Jul 12, 2024

Mamba SSM architecture

Python 11,638 953 Updated Jul 3, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 240 19 Updated Jul 10, 2024

Let us control diffusion models!

Python 29,047 2,626 Updated Feb 25, 2024

Your image is almost there!

Python 6,897 410 Updated Jun 8, 2024

Interference-aware CPU scheduling that enables performance isolation and high CPU utilization for datacenter servers

C 117 50 Updated Jun 11, 2024
C++ 31 12 Updated Jun 10, 2024

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 12,964 949 Updated Jul 9, 2024

Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.

Cuda 4 Updated Mar 4, 2024
Next