Highlights
- Pro
Block or Report
Block or report kf-zhang
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
The Modern C++ Challenge, published by Packt
What would you do with 1000 H100s...
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Provides symbolic API for model creation in PyTorch.
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
This repository houses VERY simple example code for each STL algorithm and explains what each does.
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
Open source FPGA-based NIC and platform for in-network compute
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Prints values and types during compilation!
Lightning fast C++/CUDA neural network framework
Transformers with Arbitrarily Large Context
The road to hack SysML and become an system expert
tiny ring attention implement for learning purpose
flash attention tutorial written in python, triton, cuda, cutlass