Skip to content
View botbw's full-sized avatar

Highlights

  • Pro

Organizations

@AoTTG-2 @hpcaitech

Block or report botbw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AlphaFold 3 inference pipeline.

Python 1,792 177 Updated Nov 11, 2024

Puzzles for learning Triton, play it with minimal environment configuration!

Python 63 Updated Nov 9, 2024

My learning notes/codes for ML SYS.

Python 35 Updated Nov 12, 2024

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Python 4,296 2,274 Updated Nov 12, 2024

Official MPICH Repository

C 555 281 Updated Nov 11, 2024

Open MPI main development repository

C 2,163 859 Updated Nov 7, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,804 150 Updated Oct 27, 2024

Generic PyTorch implementation of einsum that supports different semirings

Python 46 7 Updated Jul 17, 2024

Spring4Shell - Spring Core RCE - CVE-2022-22965

Python 127 85 Updated Apr 4, 2022

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,621 981 Updated Nov 6, 2024

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 410 53 Updated Sep 5, 2024

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

2,687 307 Updated Aug 14, 2024

Machine-Learning Accelerator System Exploration Tools

Python 121 54 Updated Nov 11, 2024

Examples of CUDA implementations by Cutlass CuTe

Makefile 90 12 Updated Nov 11, 2024

The best OSS video generation models

Python 1,898 192 Updated Nov 12, 2024

Seamless operability between C++11 and Python

C++ 15,739 2,111 Updated Nov 12, 2024

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

Python 2,562 246 Updated Nov 12, 2024

Dockerized Spring4Shell (CVE-2022-22965) PoC application and exploit

Python 312 235 Updated Aug 4, 2022

Remote Unauthenticated Code Execution Vulnerability in OpenSSH server (CVE-2024-6387)

Python 45 18 Updated Aug 22, 2024
C++ 215 78 Updated Nov 7, 2024

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,153 427 Updated Nov 10, 2024

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Python 264 49 Updated Mar 31, 2023

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 602 35 Updated Nov 5, 2024

Simple and fast low-bit matmul kernels in CUDA / Triton

Python 139 10 Updated Nov 10, 2024

how to optimize some algorithm in cuda.

Cuda 1,576 130 Updated Nov 12, 2024

collection of benchmarks to measure basic GPU capabilities

Jupyter Notebook 264 41 Updated Jun 21, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 9,867 855 Updated Sep 1, 2024

BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library

C++ 2,202 253 Updated May 11, 2024

A Python library transfers PyTorch tensors between CPU and NVMe

C++ 96 19 Updated Nov 12, 2024
Next