-
None
- Singapore
-
22:39
(UTC +08:00) - https://botbw.github.io/
Highlights
- Pro
Block or Report
Block or report botbw
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
This repository contains the experimental PyTorch native float8 training UX
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
Dynamic Memory Management for Serving LLMs without PagedAttention
Python-MIP: collection of Python tools for the modeling and solution of Mixed-Integer Linear programs
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
NumPy aware dynamic Python compiler using LLVM
A small package to create visualizations of PyTorch execution graphs
microsoft / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Ongoing research training transformer models at scale
Custom data types and layouts for training and inference
Material for cuda-mode lectures
DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A baseline repository of Auto-Parallelism in Training Neural Networks
A machine learning compiler for GPUs, CPUs, and ML accelerators
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Browser extension that simplifies the GitHub interface and adds useful features
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
Source for https://fullstackdeeplearning.com