Stars
Development repository for the Triton language and compiler
A baseline repository of Auto-Parallelism in Training Neural Networks
A fast communication-overlapping library for tensor parallelism on GPUs.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
DLRover: An Automatic Distributed Deep Learning System
Memory Optimizations for Deep Learning (ICML 2023)
A machine learning compiler for GPUs, CPUs, and ML accelerators
Code samples for C++ Concurrency in Action
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Production-Grade Container Scheduling and Management
The Prometheus monitoring system and time series database.
mlpack: a fast, header-only C++ machine learning library
The official SALIENT++ system described in the paper "Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching".
Artifact for the MLSys 2023 paper "Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching"
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Ghost-in-LCL / oar
Forked from kegalas/oaroar is a simple software renderer
A simple C++11 Thread Pool implementation
Curated list of project-based tutorials
List of Computer Science courses with video lectures.