Skip to content
View Raphael-Hao's full-sized avatar
🪄
Mogic
🪄
Mogic

Highlights

  • Pro
Block or Report

Block or report Raphael-Hao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go

Go 8,581 1,288 Updated Jun 26, 2024

LLama.cpp golang bindings

C++ 609 77 Updated Jun 13, 2024

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 63 3 Updated Jun 18, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 63,704 7,393 Updated Jun 22, 2024

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 51,405 8,621 Updated May 29, 2024

[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs

Cuda 68 2 Updated Jun 7, 2024
HTML 67 10 Updated Dec 2, 2022

Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models

Python 1,595 199 Updated Jun 24, 2024

[CVPR 2024] Deformable Convolution v4

Python 406 25 Updated May 17, 2024

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 47 5 Updated May 16, 2024

This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email ([email protected])

Python 230 25 Updated Mar 24, 2024
JavaScript 925 204 Updated Jun 17, 2024

Stick Rules -- Quantumult X / Loon / ClashX Rules \ Quantumult back to CN Rules

984 216 Updated Jun 18, 2024

毒奶自用,懒人配置文件(Quantumult X):去广告分流规则、Tiktok解锁重写、VSCO解锁、神机分流、blackmatrix7分流规则。

JavaScript 2,242 182 Updated Jun 5, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 51 3 Updated Jun 14, 2024

Microsoft Azure Traces

Jupyter Notebook 735 139 Updated Jun 26, 2024
Python 69 15 Updated Apr 11, 2024

Stateful LLM Serving

Python 15 1 Updated May 30, 2024

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 49 1 Updated Jun 3, 2024

Tile primitives for speedy kernels

Cuda 1,329 43 Updated Jun 27, 2024

Mamba SSM architecture

Python 11,358 922 Updated Jun 24, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 229 18 Updated Jun 13, 2024

Let us control diffusion models!

Python 28,805 2,599 Updated Feb 25, 2024

Your image is almost there!

Python 6,628 398 Updated Jun 8, 2024

Interference-aware CPU scheduling that enables performance isolation and high CPU utilization for datacenter servers

C 116 50 Updated Jun 11, 2024
C++ 31 12 Updated Jun 10, 2024

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 12,707 931 Updated Jun 26, 2024

Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.

Cuda 4 Updated Mar 4, 2024

Thunder Research Group's Collective Communication Library

C++ 16 3 Updated Apr 25, 2024

A multi-level tensor algebra superoptimizer

C++ 256 15 Updated Jun 26, 2024
Next