Skip to content
View botbw's full-sized avatar

Highlights

  • Pro

Organizations

@AoTTG-2 @hpcaitech
Block or Report

Block or report botbw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository contains the experimental PyTorch native float8 training UX

Python 198 18 Updated Jul 29, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,690 273 Updated Jul 30, 2024

Microsoft Automatic Mixed Precision Library

Python 483 36 Updated Apr 8, 2024

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Python 178 45 Updated Jul 29, 2024

learning how CUDA works

Cuda 116 12 Updated May 11, 2024

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 2,353 201 Updated Jul 26, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 131 8 Updated Jul 29, 2024

A fast MoE impl for PyTorch

Python 1,496 180 Updated Jul 5, 2024

Python-MIP: collection of Python tools for the modeling and solution of Mixed-Integer Linear programs

Python 516 91 Updated Jul 30, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 843 47 Updated Jul 14, 2024

NumPy aware dynamic Python compiler using LLVM

Python 9,676 1,111 Updated Jul 30, 2024

Minecraft, but I made it in 48 hours.

C 3,910 428 Updated Apr 26, 2024

A small package to create visualizations of PyTorch execution graphs

Jupyter Notebook 3,117 277 Updated Apr 2, 2024

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,772 335 Updated Jul 29, 2024

Ongoing research training transformer models at scale

Python 9,538 2,154 Updated Jul 30, 2024

Custom data types and layouts for training and inference

Python 448 57 Updated Jul 30, 2024

An ML Systems Onboarding list

171 6 Updated Jul 23, 2024

Material for cuda-mode lectures

Jupyter Notebook 1,996 196 Updated Jun 13, 2024

算法竞赛课件分享

3,811 761 Updated Apr 16, 2024

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Python 2,081 419 Updated Jul 23, 2023

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 931 88 Updated Jul 29, 2024

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 5,815 661 Updated Jul 29, 2024

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 134 19 Updated Jun 25, 2022

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,472 377 Updated Jul 30, 2024

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 1,609 221 Updated Jul 28, 2024

:octocat: Browser extension that simplifies the GitHub interface and adds useful features

TypeScript 23,879 1,455 Updated Jul 30, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 34 3 Updated May 29, 2024

Source for https://fullstackdeeplearning.com

HTML 1,101 202 Updated Jun 12, 2024

🔍 🐍 Like pstack but for Python!

Python 982 43 Updated Jul 24, 2024

A tutorial for CUDA&PyTorch

C++ 102 21 Updated Feb 7, 2024
Next