Skip to content
View Raphael-Hao's full-sized avatar
🪄
Mogic
🪄
Mogic

Highlights

  • Pro
Block or Report

Block or report Raphael-Hao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 6,687 501 Updated Jun 14, 2024
Python 47 4 Updated Jul 6, 2024

A PyTorch Native LLM Training Framework

Python 490 19 Updated May 31, 2024

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 140 12 Updated Jul 4, 2024

Structured Text Generation

Python 7,134 366 Updated Jul 12, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,159 2,441 Updated Jul 9, 2024

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Python 542 33 Updated Feb 27, 2024

Create LLM agents with long-term memory and custom tools 📚🦙

Python 10,825 1,163 Updated Jul 12, 2024

Umami is a simple, fast, privacy-focused alternative to Google Analytics.

TypeScript 20,576 3,887 Updated Jul 12, 2024

Start building LLM-empowered multi-agent applications in an easier way.

Python 2,958 208 Updated Jul 12, 2024

🕸 A Node app for creating a Feed Reader in Notion.

JavaScript 281 534 Updated May 23, 2024

中文 CSL 样式

XML 4,925 813 Updated Jul 3, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 343 8 Updated May 14, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,952 134 Updated Jul 12, 2024

An Attention Superoptimizer

C++ 19 Updated May 9, 2024

Torch2Chip (MLSys, 2024)

Python 42 3 Updated Jun 25, 2024

Tools for building GPU clusters

Shell 1,231 316 Updated Mar 8, 2024

Envision a world where EVERY student can read ALL the code of a teaching operating system.

C 2,162 154 Updated Jun 22, 2024

rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.

C++ 45 15 Updated Mar 17, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2,945 108 Updated Jun 26, 2024

LLM training in simple, raw C/CUDA

Cuda 21,648 2,355 Updated Jul 12, 2024

An experimental parallel training platform

40 10 Updated Mar 25, 2024
Jupyter Notebook 110 6 Updated Mar 12, 2024

Generative Agents: Interactive Simulacra of Human Behavior

15,836 1,996 Updated Jun 3, 2024

A primitive library for neural network

C++ 1,244 210 Updated Jul 10, 2024

CoreNet: A library for training deep neural networks

Python 6,740 521 Updated May 28, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 210 22 Updated Jul 11, 2024
Python 13 1 Updated Apr 21, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,338 428 Updated May 3, 2024