Skip to content
View demonbibi's full-sized avatar

Block or report demonbibi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A TensorFlow Extension: GPU performance tools for TensorFlow.

Python 25 7 Updated Jul 27, 2023

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 792 159 Updated Aug 28, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,563 400 Updated Sep 11, 2024

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 304 17 Updated Sep 3, 2024

NVIDIA Linux open GPU kernel module source

C 15,012 1,242 Updated Sep 10, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,795 299 Updated Sep 10, 2024

Making large AI models cheaper, faster and more accessible

Python 38,596 4,324 Updated Sep 11, 2024

System for AI Education Resource.

Python 3,396 425 Updated Jun 21, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,351 1,498 Updated Aug 18, 2024

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Python 2,774 715 Updated Aug 22, 2024

MetaBalance algorithm for multi-task learning

Python 55 5 Updated Feb 9, 2022

Official PyTorch Implementation for Conflict-Averse Gradient Descent (CAGrad)

Python 108 17 Updated Nov 9, 2023

LLM inference in C/C++

C++ 64,757 9,276 Updated Sep 11, 2024

how to learn PyTorch and OneFlow

326 18 Updated Mar 22, 2024

Reference implementation for DPO (Direct Preference Optimization)

Python 2,007 164 Updated Aug 11, 2024

DLRover: An Automatic Distributed Deep Learning System

Python 1,195 146 Updated Sep 10, 2024

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Python 442 133 Updated Sep 6, 2024

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

C++ 1,731 141 Updated Aug 14, 2024

A list of awesome papers and resources of recommender system on large language model (LLM).

1,130 99 Updated Aug 15, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,322 583 Updated Sep 10, 2024
C++ 4,310 463 Updated Sep 10, 2024

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

Python 1,819 273 Updated Aug 16, 2024

A simple C++ Thread Pool implementation

C++ 26 4 Updated May 17, 2022

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

C++ 932 199 Updated Sep 3, 2024

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 846 143 Updated Jul 8, 2024

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,058 615 Updated Sep 11, 2024

Kernel Tuner

Python 271 46 Updated Sep 5, 2024

how to optimize some algorithm in cuda.

Cuda 1,417 118 Updated Sep 9, 2024

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 246 48 Updated Sep 11, 2024
Python 3 2 Updated Feb 1, 2023
Next